4350 - Advanced Software Engineering
Week-2 activities


1. Define a file specification for a compressed file.
-----------------------------------------------------

   File components should include:

   a. Compression type

   b. File-name of compressed file
      Files in currect directory will have no path
      Files in a subdirectory will include the path

   c. Size in bytes of the uncompressed size of each file

   d. The compressed data stream

2. Write a C++ compression (encode) function.
---------------------------------------------

   Start with your lab-1 spell.cpp program.

   Write a function that will write an output file containing LZ-77
   compression codes as we learned.

   Build in a unit test that will compress a known string such as "banana". 

   Give your program the ability to get input from a user to determine
   what data should be compressed.

      a. command-line input
      b. file input

3. Write a C++ decode function.
------------------------------

   Your function will read LZ-77 compressed codes and produce an output
   file of uncompressed data.

   a. Your file-spec compression type will guide your program to use a
      decode function that matches your encode function.

   b. Direct the output to the path and filename stored.


Compression details
-------------------

Start with an encode function that simply writes LZ-77 codes without trying
to compress the data.

LZ-77 codes:

byte: offset
byte: size of repeated data
byte: next character

The word "book" can be stored like this

   '0' '0' 'b'   3
   '0' '0' 'o'   3
   '0' '0' 'o'   3
   '0' '0' 'k'   3
               ---
                12

The word "book" can also be stored like this

   '0' '0' 'b'   3
   '0' '0' 'o'   3
   '1' '1' 'k'   3
               ---
                 9   (improved compression)

Assuming 1-byte per code, the compressed data occupies 12 or 9 bytes for
data of 4-bytes. No actual compression for this small data sample.


Ideas on how to improve compression
-----------------------------------

Specify a compression type with the following LZ-77 codes:

nibble: offset
nibble: size of repeated data
byte: next character

The word "book" can be stored like this

   '0' '0' 'b'   2
   '0' '0' 'o'   2
   '0' '0' 'o'   2
   '0' '0' 'k'   2
               ---
                 8

The word "book" can also be stored like this

   '0' '0' 'b'   2
   '0' '0' 'o'   2
   '1' '1' 'k'   2
               ---
                 6   (improved compression)


With a change in file specifications, we have gone from 12-bytes to 6-bytes
for the same input data.