4350 - Advanced Software Engineering
Week-2 activities
1. Define a file specification for a compressed file.
-----------------------------------------------------
   File components should include:
   a. Compression type
   b. File-name of compressed file
      Files in currect directory will have no path
      Files in a subdirectory will include the path
   c. Size in bytes of the uncompressed size of each file
   d. The compressed data stream
2. Write a C++ compression (encode) function.
---------------------------------------------
   Start with your lab-1 spell.cpp program.
   Write a function that will write an output file containing LZ-77
   compression codes as we learned.
   Build in a unit test that will compress a known string such as "banana". 
   Give your program the ability to get input from a user to determine
   what data should be compressed.
      a. command-line input
      b. file input
3. Write a C++ decode function.
------------------------------
   Your function will read LZ-77 compressed codes and produce an output
   file of uncompressed data.
   a. Your file-spec compression type will guide your program to use a
      decode function that matches your encode function.
   b. Direct the output to the path and filename stored.
Compression details
-------------------
Start with an encode function that simply writes LZ-77 codes without trying
to compress the data.
LZ-77 codes:
byte: offset
byte: size of repeated data
byte: next character
The word "book" can be stored like this
   '0' '0' 'b'   3
   '0' '0' 'o'   3
   '0' '0' 'o'   3
   '0' '0' 'k'   3
               ---
                12
The word "book" can also be stored like this
   '0' '0' 'b'   3
   '0' '0' 'o'   3
   '1' '1' 'k'   3
               ---
                 9   (improved compression)
Assuming 1-byte per code, the compressed data occupies 12 or 9 bytes for
data of 4-bytes. No actual compression for this small data sample.
Ideas on how to improve compression
-----------------------------------
Specify a compression type with the following LZ-77 codes:
nibble: offset
nibble: size of repeated data
byte: next character
The word "book" can be stored like this
   '0' '0' 'b'   2
   '0' '0' 'o'   2
   '0' '0' 'o'   2
   '0' '0' 'k'   2
               ---
                 8
The word "book" can also be stored like this
   '0' '0' 'b'   2
   '0' '0' 'o'   2
   '1' '1' 'k'   2
               ---
                 6   (improved compression)
With a change in file specifications, we have gone from 12-bytes to 6-bytes
for the same input data.