huffman encoding algorithm

huffman_encoding

Data transmission and storage cost money. The more information being dealt with, the more it costs. In spite of this, most digital data are not stored in the most compact form. Rather, they are stored in whatever way makes them easiest to use, such as: ASCII text from word processors, binary code that can be executed on a computer, individual samples from a data acquisition system, etc.

Data compression is the general term for the various algorithms and programs developed to address this problem. A compression program is used to convert data from an easy-to-use format to one optimized for compactness. Likewise, an uncompression program returns the information to its original form.

There are many different reasons for and ways of encoding data, and one of these ways is Huffman coding. This is used as a compression method in digital imaging and video as well as in other areas. The idea behind Huffman coding is simply to use shorter bit patterns for more common characters, and longer bit patterns for less common characters.

Huffman coding is based on the frequency of occurance of a data item (pixel in images). The principle is to use a lower number of bits to encode the data that occurs more frequently.

The first version of Huffman encoding


#include 
#include   // a multimap will be used so we have to include it

using namespace std;

void binary(int number) {  //thise procedure is used to convert a
int remainder;          //number into a binary number

if(number <= 1) {
cout << number;
return;
}

remainder = number%2;
binary(number >> 1);
cout << remainder;
}

int main()
{
char ch, alphabet[31];   //store each char in ch, 31 chars used because of 26 alphabet, dot, comma, space
typedef multimap huffman; //make a multimap that will be sorted by integers, we use a multimap
huffman sentence;                   //because some characters may have the same number of occurences

int i=0,j=0;

for(int j=0; j<31; j++)      //alphabet[0..30] = 0;
alphabet[j]=0;

while((ch=getchar())!=‘\n‘) //insert chars into ch until enter(return) is pressed
{
if(ch==‘ ‘)
alphabet[28]++;     //increment number of spaces
else if(ch==‘,’)
alphabet[29]++;     //increment number of commas
else if(ch==‘.’)
alphabet[30]++;     //increment number of dots
else{
ch = toupper(ch);    //convert all chars to upper case
alphabet[(static_cast(ch))-65]++;//increment (char ASCII value) - 65 (this is the index number)
}
}

for(int j=0; j<31; j++)
{
if(alphabet[j]>0)   //check if char occured in the text
{
i++;  //count the number of found chars, commas, dots, spaces
if(j==28)
sentence.insert(make_pair(alphabet[j],‘ ‘));  //make the pair from space and its occurences
else if(j==29)
sentence.insert(make_pair(alphabet[j],‘,’));  //make the part from comma and its occurences
else if(j==30)
sentence.insert(make_pair(alphabet[j],‘.’));  //make the pair from dot and its occurences
else
sentence.insert(make_pair(alphabet[j],static_cast(65+j))); //make the pair from chars and its occurences
}
}

huffman::iterator pos;   //make an iterator for the multimap
for(pos = sentence.begin(); pos!=sentence.end(); ++pos)
{
j++;  //number of different chars
cout << “Letter “ << pos->second << ” “;  //print out the letter and then its occurence code
binary(i-j);   //convert the number to binary
cout << endl;
}
return 0;
}

The second version of Huffman encoding by Una Benlic


#include
#include
#include
#include

using namespace std;

char ch[100]; //the characters in the text
string binary[100]; //binary representation of each character
int c=0; //number of distinct characters

string reverse(string line) //procedure that reverses a line
{
string result=“”;
for(int i=line.size()-1; i>=0; i–)
result+=line[i];
return result;
}
string to_binary(int num) //procedure that converts a number to it’s binary representaion
{
string result=“”;
while(num/2!=0)
{
result+=(num%2)+‘0′;
num/=2;
}
result+=(num%2)+‘0′;
return reverse(result);
}

//Once a Huffman code has been generated, data may be encoded simply by
//replacing each symbol with it’s code
string *encode(string text)
{
string* array= new string[text.size()]; //array that stores a binary number associated
//with each character of the text
for(int i=0; icounter; //holds thee character and the number of their occurence
while(getline(cin, line) and line.size()>0)
text+=line;
for(i=0; i::const_iterator iter = counter.begin(); iter!=counter.end(); iter++)
{
ch[c]=iter->first;
occurence[c]=iter->second;
c++;
}
for(i=0; i

One thought on “huffman encoding algorithm

  1. Ankur January 24, 2007 at 4:26 am

    during college days external links on wikipedia data structures pages served as a good src for finding and learning abut data struture codes. This should be helpful for students. The images bring more clarity to the subject. Bravo ….

Comments are closed.