Assignment 1 – text compression

due at 23:59 on   +60

Introduction

In this activity, we will investigate the Huffman algorithm for text compression. You’ve already seen one example of a Huffman encoding, represented by the strange-looking tree on the handout labeled “variable-bit Huffman encoding.”

You will follow the Huffman algorithm and create a tree of your own, based on the character frequencies of a message that I provide. The video below illustrates the algorithm on paper. I apologize that the resolution and audio quality aren’t great, but it should be understandable. The final encoding and tree are also pictured below.

*Error at 21:54 – 39×5 should be 195 bits.

The tree I produced

The tree I produced

Encoding the phrase using that tree, result is 148 bits

Encoding the phrase using that tree, result is 148 bits

You should also answer the following questions.

  1. How many distinct characters did your message contain?

  2. If we were using a fixed-width encoding, how many bits (per character) would you need to represent just those characters?

  3. What is the most frequent character in your message, and how many times did it appear?

  4. How many bits are used to represent the most frequent character in your message?

  5. What is the most number of bits used to encode any character in your message?

  6. Use the tree you produced to encode the entire message you were given. How many bits are used, in total?

The technique in the rest of this document refers to using sticky notes and only drawing the tree at the end. Either way, the result should be the same. (The sticky notes were a little harder to manage on video.) If you were able to get your tree and answer the six questions above then you’re done – submit as follows.

How to submit

Take a photo of the tree you drew that represents your variable-width Huffman encoding. Try to make it as legible as possible – redraw on a fresh sheet of paper if needed.

Write your answers to the six questions into a text document – the formats .doc, .docx, .txt, or odt are all fine.

Upload both files to this dropbox for assignment 1.