what are hash tables and how do they work?

normal

As we saw with binary search, certain ageless data structures such as a binary search tree can help improve the efficiency of searches. From linear search to binary search, we improved our search efficiency from O(n) to O(logn) beautifully. We now present a new data structure, called a hash table, that will increase our efficiency to O(1), or constant time.

In computer science, a hash table, or a hash map, is a data structure that associates keys with values. The primary operation it supports efficiently is a lookup: given a key (e.g. a person’s name), find the corresponding value (e.g. that person’s telephone number). It works by transforming the key using a hash function into a hash, a number that is used to index into an array to locate the desired location (”bucket”) where the values should be.

Hash tables are often used to implement associative arrays, sets and caches. Like arrays, hash tables provide constant-time O(1) lookup on average, regardless of the number of items in the table. (O(1) means that it takes a constant amount of time independent of the number of items involved.) However, the rare worst-case lookup time can be as bad as O(n). Compared to other associative array data structures, hash tables are most useful when large numbers of records are to be stored, especially if the size of the data set can be predicted.

hash

Hash tables may be used as in-memory data structures. Hash tables may also be adopted for use with persistent data structures; database indexes sometimes use disk-based data structures based on hash tables, although balanced trees are more popular.

//this program creates a hash table as a set of linked lists
// here 10 pointers are used to address 10 linked lists which form a hash tablw
#include
#include

int main()
{
int i,k,num;
struct node
{
int n;
struct node* next;
};

struct node *temp,*p1,*AOP[10];

for(i=0;i<10;i++)
AOP[i]= NULL;
for( i=0;i<20;i++)
{
scanf(“%d”, &num);
k=num%10;
p1=( struct node*) malloc ( sizeof(struct node));

p1->n=num;
p1->next=NULL;

if(AOP[k]==NULL)
AOP[k]=p1;
else
{
p1->next=AOP[k];
AOP[k]=p1;
}
}

for(i=0;i<10;i++)
{
temp=AOP[i];
printf(“List number   %d :”, i);
while(temp!=NULL)
{
printf(“%5d”,temp->n);
temp=temp->next;
}
printf(“\n“);
}
}

huffman encoding algorithm

huffman_encoding

Data transmission and storage cost money. The more information being dealt with, the more it costs. In spite of this, most digital data are not stored in the most compact form. Rather, they are stored in whatever way makes them easiest to use, such as: ASCII text from word processors, binary code that can be executed on a computer, individual samples from a data acquisition system, etc.

Data compression is the general term for the various algorithms and programs developed to address this problem. A compression program is used to convert data from an easy-to-use format to one optimized for compactness. Likewise, an uncompression program returns the information to its original form.

There are many different reasons for and ways of encoding data, and one of these ways is Huffman coding. This is used as a compression method in digital imaging and video as well as in other areas. The idea behind Huffman coding is simply to use shorter bit patterns for more common characters, and longer bit patterns for less common characters.

Huffman coding is based on the frequency of occurance of a data item (pixel in images). The principle is to use a lower number of bits to encode the data that occurs more frequently.

The first version of Huffman encoding


#include 
#include   // a multimap will be used so we have to include it

using namespace std;

void binary(int number) {  //thise procedure is used to convert a
int remainder;          //number into a binary number

if(number <= 1) {
cout << number;
return;
}

remainder = number%2;
binary(number >> 1);
cout << remainder;
}

int main()
{
char ch, alphabet[31];   //store each char in ch, 31 chars used because of 26 alphabet, dot, comma, space
typedef multimap huffman; //make a multimap that will be sorted by integers, we use a multimap
huffman sentence;                   //because some characters may have the same number of occurences

int i=0,j=0;

for(int j=0; j<31; j++)      //alphabet[0..30] = 0;
alphabet[j]=0;

while((ch=getchar())!=‘\n‘) //insert chars into ch until enter(return) is pressed
{
if(ch==‘ ‘)
alphabet[28]++;     //increment number of spaces
else if(ch==‘,’)
alphabet[29]++;     //increment number of commas
else if(ch==‘.’)
alphabet[30]++;     //increment number of dots
else{
ch = toupper(ch);    //convert all chars to upper case
alphabet[(static_cast(ch))-65]++;//increment (char ASCII value) - 65 (this is the index number)
}
}

for(int j=0; j<31; j++)
{
if(alphabet[j]>0)   //check if char occured in the text
{
i++;  //count the number of found chars, commas, dots, spaces
if(j==28)
sentence.insert(make_pair(alphabet[j],‘ ‘));  //make the pair from space and its occurences
else if(j==29)
sentence.insert(make_pair(alphabet[j],‘,’));  //make the part from comma and its occurences
else if(j==30)
sentence.insert(make_pair(alphabet[j],‘.’));  //make the pair from dot and its occurences
else
sentence.insert(make_pair(alphabet[j],static_cast(65+j))); //make the pair from chars and its occurences
}
}

huffman::iterator pos;   //make an iterator for the multimap
for(pos = sentence.begin(); pos!=sentence.end(); ++pos)
{
j++;  //number of different chars
cout << “Letter “ << pos->second << ” “;  //print out the letter and then its occurence code
binary(i-j);   //convert the number to binary
cout << endl;
}
return 0;
}

The second version of Huffman encoding by Una Benlic


#include
#include
#include
#include

using namespace std;

char ch[100]; //the characters in the text
string binary[100]; //binary representation of each character
int c=0; //number of distinct characters

string reverse(string line) //procedure that reverses a line
{
string result=“”;
for(int i=line.size()-1; i>=0; i–)
result+=line[i];
return result;
}
string to_binary(int num) //procedure that converts a number to it’s binary representaion
{
string result=“”;
while(num/2!=0)
{
result+=(num%2)+‘0′;
num/=2;
}
result+=(num%2)+‘0′;
return reverse(result);
}

//Once a Huffman code has been generated, data may be encoded simply by
//replacing each symbol with it’s code
string *encode(string text)
{
string* array= new string[text.size()]; //array that stores a binary number associated
//with each character of the text
for(int i=0; icounter; //holds thee character and the number of their occurence
while(getline(cin, line) and line.size()>0)
text+=line;
for(i=0; i::const_iterator iter = counter.begin(); iter!=counter.end(); iter++)
{
ch[c]=iter->first;
occurence[c]=iter->second;
c++;
}
for(i=0; i

what is quicksort and how does it work? video of cards sorted by quick sort

sorting_quicksort_anim

Pick an element from the array (the pivot), partition the remaining elements into those greater than and less than this pivot, and recursively sort the partitions. There are many variants of the basic scheme above: to select the pivot, to partition the array, to stop the recursion on small partitions, etc.

Quick-sort is a randomized sorting algorithm based on the divide-and-conquer method.

The worst-case efficiency of the quick sort, O(n2), occurs when the list is sorted and the left-most element is chosen. Randomly choosing a pivot point rather than using the left-most element is recommended if the data to be sorted isn’t random. As long as the pivot point is chosen randomly, the quick sort has an algorithmic complexity of O(nlogn).

Video of how quick sort works with cards:

View This Video on You Tube

The quick sort is by far the fastest of the common sorting algorithms. It’s possible to write a special-purpose sorting algorithm that can beat the quick sort for some data sets, but for general-case sorting there isn’t anything faster. Quick sort is not stable. The worst-case space-complexity is O(N), but it can be limited to O(log(N)) if the code is modified.

#include 
using namespace std;

int a[10];
int partition(int left, int right);
void swap(int i, int j);
void sort(int i, int j);

int main()
{
int i,j=0,k=9;
for(i=0;i<10;i++) cin >> a[i];

sort(j,k);
for(i=0;i<10;i++)
cout << a[i] << endl; } void sort(int left, int right) { int p; if(left>=right)
return;
p = partition(left, right);

sort(left,p-1);
sort(p+1,right);
}

int partition(int left, int right)
{
int first=left, pivot=right–;
while(left<=right)
{
while(a[left]<a[pivot]) left++; while((right>=first)&&(a[right]>=a[pivot]))
right–;
if(left<right)
{
swap(left,right);
left++;
}
}
if(left!=pivot)
swap(left,pivot);

return left;
}

void swap(int i, int j)
{
int temp=a[i];
a[i]=a[j];
a[j]=temp;
}

the monte carlo algorithm/method

The Monte Carlo method is a way of solving problems using statistical methods; stochastic technique (which means using random numbers) and probability.

Given some probability, P, that an event will occur in certain conditions, a computer can be used to generate those conditions repeatedly. The number of times the event occurs divided by the number of times the conditions are generated should be approximately equal to P.

Scope: The Monte Carlo methods are applied in various fields of mathematics, physics, economics and etc. Monte Carlo simulation methods are especially useful in studying systems with a large number of coupled degrees of freedom, such as liquids, disordered materials, strongly coupled solids, and cellular structures. The Monte Carlo method is often useful for solving problems in physics and mathematics which cannot be solved by analytical means. They are useful for modeling phenomena with significant uncertainty in inputs, such as the calculation of risk in business.

Calculation of Pi: Let’s make a circle inside a square and let’s take a quadrant out of it.

calculation_of_pi

If we take a dice and throw it randomly inside the square, the probability that the dice will end up in the circle area is circle_area/square_area. Now if we replace it with formulas, just for one quadrant, we get:
Pi=4*dice_hitting_circle_area/dice_hitting_square_area. So by Pythagoras theorem we just have to take random values for x and y coordinates, square them, sum them up, take the square root out of the sum and if the result is less or equal to 1 then the dice is inside the circle. By repeating this few times we can get the approximation of the number Pi.


#include 
#include 
#include 
using namespace std;

int main()
{
double x, y;            //define x and y coordiantes
int count_inside = 0, till=1000;    //how many tries you want
srand(time(NULL));           //define random values
for(int i=0; i