Ways to Calculate Hashing in Data Structure
This article was published as a part of the Data Science Blogathon.
Hashing is the process of mapping keys and values into a hash table by using a hash function. It makes elements more accessible faster. The efficiency of the hash function determines how well it can handle the mapping.
When you have 20000 numbers in a list and you are asked to search for a number from within the list, you will scan each number until you find a match.
The process of searching an entire list for that specific number takes a lot of time. Not only is this manual process of scanning time-consuming, but it is also inefficient. The data structure reduces the search time and allows you to find the number within seconds due to its hashing function.
Key Terminologies of Hashing
Key – An Identifier to uniquely identify the Data(Entity).
Value – The Data Entity (with its associated details) that we are storing.
Hashing works in two steps:
- The algorithm accepts any Entity (as a key) as input. If that key isn’t an integer, we will need to provide it with some way to get an integer value from the entity(N).
- We cannot use this integer as the Memory address to place this Key-Value once we have got an initial Integral value (N). This will not work, but the hash function converts this integer into another integer value that can be used to store the key-value in memory.
What is Hashing?
The hashing algorithm is designed to solve the problem of efficiently finding and storing data within an array.
For example: A student is an entity. Each student can have a name, address, phone number, unique ID (Roll number).
To make it a key-value pair we take the roll number as the key and the pair will be rest attributes like address, name, phone number, etc. This way we created a mapping of students with roll number as key and basic information as value. Now if we want to know a student’s details we can just know his/her roll number.
What is Data Here to be Stored?
You can think of it as an object in OOPS terms, or as a set of properties that all belong to a specific thing.
As shown above, Student data can include information like his Roll Number, his name, his address, and so on.
We can think of all these details as “data” we want to store. The Roll Number is the key since it is unique to each student.
Roll Number = Key, and Student Details = Value.
The idea of hashing is used in the world of Data Structures to find the Memory address or Index in which to store this Key-Value pair in a given data structure. We can’t place these data at either 100 or any Student detail if Key=100 and Value=certain Student detail, for example.
As a result, we use the hash function which is able to take a key as input and return a memory address(or an index inside the Data structure) to place a key-value pair at.
Later in this section, we’ll examine how this works internally in detail.
Hash Table in Data Structure
Key-value pairs are stored in hash tables in data structures using hashing.
When storing some data (e.g. Value) we can use a Hash Table as long as we have a unique key.
Methods to Calculate Hashing in Data Structure
Basically, the hash function is a mathematical formula that will return a small integer value (within an array size) for certain big keys.
The following are three methods of how this method works internally:
1) Division Method – Among all the methods, this is the easiest to understand. Consider the scenario where there are only 10 cells in an array, and the Key is 15, or greater than 10. If we don’t have a method that takes 15 and returns 10, we need a method that returns an index < 10. This can be done pretty easily by modulus operators.
2) Method for dividing: HashFunction = Key % size
If Key is divided by Size, the remainder is returned by the Modulus Operator here. We can insert this Data at the 5th index of the array of size 10 if key=15. For example, 15%10=5 and 5*10, so if key=15, then we insert it at 5 in the array of size 10.
3) Multiplication Method – Works similarly to Division by using the following function:
h(k) = floor( n( kA mod 1 ) )
The key k will be the constant value between 0 and 1, and A will be any constant value between 0 and 1.
Both k and A are multiplied, and their fractional part is separated using the modulus operator. This is then multiplied with n to get the hash value.
k=123 n=100 A=0.618033 (Has to be between 0 and 1) h(123) = 100 (123 * 0.618033 mod 1) = 100 (76.018059 mod 1) = 100 (0.018059) = 1
The value K=123 can be found at index 1 if the Key is K.
- Universal Hashing – At the heart of universal hashing is the idea that we do not use a fixed hash function but rather can randomly choose any of many different hash functions. There must be no collisions for the key x, as the basic principle of Universal hashing states that there is * 1 collision for the key x. We can choose any hash function from this set at runtime after we have such a family of hash functions. The randomness of this hashing technique ensures that its spread is most evenly distributed among the other hashing techniques.
- Folding Method – Folding is a method of finding an index such that it fits within the given array size. A folded key is one that is divided into parts of 2 if the array size is 100, meaning that 2-digit numbers can only be contained in it. i.e., if Key=20574, then fold it into 2 parts of 20, 57, and 4. From these parts, we can sum them up to get an index of 100, i.e., The sum of 20 + 57 + 4 is 81. In this case, 81 identifies the index where Key=20574 may be placed.
Please note that there can be cases where even after folding, we get a number > Size e.g., Key=56571 then breaking it down into parts of 2= 56+57+1=114
Therefore, this Data cannot be placed at index 114 of the array of size 100, so we can either use the algorithm again or the Division method (most commonly used in such scenarios) to obtain 114%100=14.
Therefore, this Data should be placed at index 14 of this array.
Likewise, if the array size is 1000, then we can insert three-digit numbers, so key=20574 has the value 205+74=279.
In an array of Size 1000, Data with Key=20574 will be at the 279th Index.
- Mid Square Method – To find the index of a squared number, we start by squaring the number. We then pick the middle numbers of that squared number. In the example above, if the array size is 100, which means that numbers can only be 2 digits, let’s take N as 2 (N can be anything).
- A hash of 931 equals 931^2 = 866761. We pick the middle element, 67, from the given N(=2) elements.
- 93 can be placed at an index of 67, so 67 would be a possible index to place it at.
- The Division method can again be used in case the index obtained from the mid square method is greater than the array size.
- Consider N=2, Array size=11, and Array size=12.
- In this case, hash(93) = 93^2 = 8649. We can apply the Division method to get 64%11=9 when using N(=2) middle elements, i.e. 64 Now that 64 > 11, we can apply the Division method to get 64%11 = 9.
- So 93 can be placed at the 9th Index in the array of size 11.
In computing, hashing is the process of finding a small Number from a Data set. Using this number, you can determine where to store the data. If the database is stored in memory, this could be the index in an Array or a database location on the disk.
- Indexing and retrieving items using the hashing method is faster than using the original Key since the shorter hashing key can be used instead.
- There are several terms used in hashing, including bucket, Key, hash function, linear probing, quadratic probing, hash index, and collisions.
- It’s called a collision when the index obtained from two different inputs is the same.
- There are two methods that help you avoid collisions in hashing, rehashing, and chaining.
- Chaining is when both the hashes are placed at the same index, with the values chained together. Rehashing is when we keep on hashing until we find the vacant index.
- The Insertion and Search can be performed using hashing in Constant Time, which is O(1) for most cases.
- If it is possible to distribute the elements as evenly as possible, a good hash function can bring the complexity close to O(1).
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.