what are hash tables and how do they work?

normal

As we saw with binary search, certain ageless data structures such as a binary search tree can help improve the efficiency of searches. From linear search to binary search, we improved our search efficiency from O(n) to O(logn) beautifully. We now present a new data structure, called a hash table, that will increase our efficiency to O(1), or constant time.

In computer science, a hash table, or a hash map, is a data structure that associates keys with values. The primary operation it supports efficiently is a lookup: given a key (e.g. a person’s name), find the corresponding value (e.g. that person’s telephone number). It works by transforming the key using a hash function into a hash, a number that is used to index into an array to locate the desired location (”bucket”) where the values should be.

Hash tables are often used to implement associative arrays, sets and caches. Like arrays, hash tables provide constant-time O(1) lookup on average, regardless of the number of items in the table. (O(1) means that it takes a constant amount of time independent of the number of items involved.) However, the rare worst-case lookup time can be as bad as O(n). Compared to other associative array data structures, hash tables are most useful when large numbers of records are to be stored, especially if the size of the data set can be predicted.

hash

Hash tables may be used as in-memory data structures. Hash tables may also be adopted for use with persistent data structures; database indexes sometimes use disk-based data structures based on hash tables, although balanced trees are more popular.

//this program creates a hash table as a set of linked lists
// here 10 pointers are used to address 10 linked lists which form a hash tablw
#include
#include

int main()
{
int i,k,num;
struct node
{
int n;
struct node* next;
};

struct node *temp,*p1,*AOP[10];

for(i=0;i<10;i++)
AOP[i]= NULL;
for( i=0;i<20;i++)
{
scanf(“%d”, &num);
k=num%10;
p1=( struct node*) malloc ( sizeof(struct node));

p1->n=num;
p1->next=NULL;

if(AOP[k]==NULL)
AOP[k]=p1;
else
{
p1->next=AOP[k];
AOP[k]=p1;
}
}

for(i=0;i<10;i++)
{
temp=AOP[i];
printf(“List number   %d :”, i);
while(temp!=NULL)
{
printf(“%5d”,temp->n);
temp=temp->next;
}
printf(“\n“);
}
}

huffman encoding algorithm

huffman_encoding

Data transmission and storage cost money. The more information being dealt with, the more it costs. In spite of this, most digital data are not stored in the most compact form. Rather, they are stored in whatever way makes them easiest to use, such as: ASCII text from word processors, binary code that can be executed on a computer, individual samples from a data acquisition system, etc.

Data compression is the general term for the various algorithms and programs developed to address this problem. A compression program is used to convert data from an easy-to-use format to one optimized for compactness. Likewise, an uncompression program returns the information to its original form.

There are many different reasons for and ways of encoding data, and one of these ways is Huffman coding. This is used as a compression method in digital imaging and video as well as in other areas. The idea behind Huffman coding is simply to use shorter bit patterns for more common characters, and longer bit patterns for less common characters.

Huffman coding is based on the frequency of occurance of a data item (pixel in images). The principle is to use a lower number of bits to encode the data that occurs more frequently.

The first version of Huffman encoding


#include 
#include   // a multimap will be used so we have to include it

using namespace std;

void binary(int number) {  //thise procedure is used to convert a
int remainder;          //number into a binary number

if(number <= 1) {
cout << number;
return;
}

remainder = number%2;
binary(number >> 1);
cout << remainder;
}

int main()
{
char ch, alphabet[31];   //store each char in ch, 31 chars used because of 26 alphabet, dot, comma, space
typedef multimap huffman; //make a multimap that will be sorted by integers, we use a multimap
huffman sentence;                   //because some characters may have the same number of occurences

int i=0,j=0;

for(int j=0; j<31; j++)      //alphabet[0..30] = 0;
alphabet[j]=0;

while((ch=getchar())!=‘\n‘) //insert chars into ch until enter(return) is pressed
{
if(ch==‘ ‘)
alphabet[28]++;     //increment number of spaces
else if(ch==‘,’)
alphabet[29]++;     //increment number of commas
else if(ch==‘.’)
alphabet[30]++;     //increment number of dots
else{
ch = toupper(ch);    //convert all chars to upper case
alphabet[(static_cast(ch))-65]++;//increment (char ASCII value) - 65 (this is the index number)
}
}

for(int j=0; j<31; j++)
{
if(alphabet[j]>0)   //check if char occured in the text
{
i++;  //count the number of found chars, commas, dots, spaces
if(j==28)
sentence.insert(make_pair(alphabet[j],‘ ‘));  //make the pair from space and its occurences
else if(j==29)
sentence.insert(make_pair(alphabet[j],‘,’));  //make the part from comma and its occurences
else if(j==30)
sentence.insert(make_pair(alphabet[j],‘.’));  //make the pair from dot and its occurences
else
sentence.insert(make_pair(alphabet[j],static_cast(65+j))); //make the pair from chars and its occurences
}
}

huffman::iterator pos;   //make an iterator for the multimap
for(pos = sentence.begin(); pos!=sentence.end(); ++pos)
{
j++;  //number of different chars
cout << “Letter “ << pos->second << ” “;  //print out the letter and then its occurence code
binary(i-j);   //convert the number to binary
cout << endl;
}
return 0;
}

The second version of Huffman encoding by Una Benlic


#include
#include
#include
#include

using namespace std;

char ch[100]; //the characters in the text
string binary[100]; //binary representation of each character
int c=0; //number of distinct characters

string reverse(string line) //procedure that reverses a line
{
string result=“”;
for(int i=line.size()-1; i>=0; i–)
result+=line[i];
return result;
}
string to_binary(int num) //procedure that converts a number to it’s binary representaion
{
string result=“”;
while(num/2!=0)
{
result+=(num%2)+‘0′;
num/=2;
}
result+=(num%2)+‘0′;
return reverse(result);
}

//Once a Huffman code has been generated, data may be encoded simply by
//replacing each symbol with it’s code
string *encode(string text)
{
string* array= new string[text.size()]; //array that stores a binary number associated
//with each character of the text
for(int i=0; icounter; //holds thee character and the number of their occurence
while(getline(cin, line) and line.size()>0)
text+=line;
for(i=0; i::const_iterator iter = counter.begin(); iter!=counter.end(); iter++)
{
ch[c]=iter->first;
occurence[c]=iter->second;
c++;
}
for(i=0; i