Super Simple Compression (SCC)

File Specification

Description:

SCC is file compression program that utilizes the LZ-77 algorithm to take in input data and reduce its size by encoding redundant data into metadata. A test.txt file is read into the program; its contents are compressed using LZ-77 compression. The program then outputs a lz77.cmp file containing the compressed metadata.The encoded metadata is then decoded back into its original form. The comprssed file lz77.cmp is read back into the program and decoded into the original text. The decoded output is put into a decompressed.txt file.

Terms:

input stream
the sequence of data bytes to be compressed.
byte
basic data element of the input stream.
coding position
the current position of the byte int the input stream being encoded.
lookahead buffer
the byte sequences from the coding position to the end of the input stream.
search buffer
the byte sequence from the coding postion back to the first byte.

Compression Process:

The compression process begins with the coding position at the first byte of the input stream and finishes when the coding position has passed the end of the stream. With each iteration, the search buffer is searched (from the byte before the current location of the coding position back to the beginning of the stream). If a match is found, the look-ahead buffer (the bytes subsequent to the coding position) is searched for more matches. The number of bytes to where the match was found in the search buffer is recorded in the first output field, the number of bytes of more matching characters subsequent to the first match is recorded in the second output field, and the byte after the location of the last matching char in the look-ahead buffer is recorded as the third output field. If no match is found, the first two output fields are set to 0 and the char in the coding position is recorded in the third output field.

Example:

File content: 'abracadabra'
Resulting compression:
Final output: '00a00b00r31c21d74'

kcachegrind Compression Profiling:

The following is a snapshot of kcachegrind profiling that was performed during the compression process of the program.