Andrew Mccuan

CMPS 4350: Advanced Software Engineering

Project 1:

File Specifications

Name:

AMC - Andrew Mccuan Compression

Description:

The AMC format is a file format that uses the idea of LZ-77 algorithm to compress files data. At this version the format does not fully integrate the LZ-77 algorithm and have proper compression. Also using this format it allows for a file to be encoded and decoded without losing it's original file name and format.

When a file is formatted into AMC the data is stored in a manner of 3 bytes of characters. The first character stores the offset, the second character stores the size of repeated (or how many are repeated), and finally the last character stores the next character of data.

EX:

1st char byte: offset
2nd char byte: size of repeated data
3rd char byte: next character

A word like "book" would be translated into a string like:

'0' '0' 'b' = (00b)
'0' '0' 'o' = (00o)
'1' '1' 'k' = (11k)
00b00o11k

Then this would be transcribed into the file in binary making it unreadable to human eye.

Using the AMC file converting program can allow for files to be encoded into AMC format and decoded back into their original format with keeping its name and extension.

NOTE: This is a WORK-IN-PROGRESS and this formatting style has not been finalized.

The Format:

Each AMC file is formatted in the following:

  1. The file header will have "AMC" to identify the file type.
  2. Whitespace (blanks, TABs, CRs, LFs). This is to have the next header info on the next line
  3. Original file name without extension in quotations
  4. Whitespace
  5. Original file extension in quotations
  6. All following lines in file contain data from original file stored in LZ-77 algorithm format (as described above) which is the encoded version of the data.

AMC format's first line in the encoded file contains the header AMC which is used for the program to determine if the file is or isn't in the AMC format, without needing to know the extension of the file.

The second line contains the original name of the file that was encoded without the extension in quotations. This is used when the file gets decoded back into it's original file and saves its file name. It is in quotations because the program looks for quotes to mark the location of the filename and extension. The third line is similar to line 2, this line contains the file extension in quotes and is also used to restore the original filename.

File Example:

                                    AMC
                                    "example"
                                    "text"
                                    {data encoded is here}
                                    {data encoded cont...}
                                    {data encoded cont...}
                                

kcachegrind Screenshot