I've made progress with unpacking compressed ALICE_2 images. Here's what I've discovered about how it is encoded.
The ALICE.exe compressor reads the instructions from ALICE.bin, translates the addresses of BL/BLX instructions, adds the instructions to a binary tree which is then traversed to create a dictionary. The dictionary is a histogram of instruction frequencies in sorted in descending order (it is still unclear how instructions of the same frequency are sorted). Using the histogram, range registers are calculated and the instructions falling into these ranges are range encoded to reduce their size. Since the most frequent instructions are reduced the most, this is where we see the most compression. The most infrequent instructions (those comprising the tail of the histogram), are in fact encoded to be larger than their original size.
The range encoded instructions are then bitpacked into the final compressed ALICE in blocks of a set size (e.g., 64 bytes, 32 instructions). The start of a block is byte addressed and referenced by a mapping table, so bit padding (0s) is added to the end of a block when necessary.
Postprocessing involves prepending a 40 byte header which includes the compressed offset, mapping table offset, and dictionary offset. The mapping table follows the compressed ALICE body, followed by the dictionary (truncated to the last range register).
The code (rewritten in python) is available here available under GPL:
You will need python3 and the bitstring package.
The encoder is not currently working because I do not know how instructions of the same frequency are sorted in the histogram. Once this is clear, the encoder will be working.
I am interested in testing various compressed ALICE components, particularly if you have the corresponding original uncompressed ALICE.bin. Submit an issue to the github project and attach a link to the files you are using if you find any problems.