Question

1. Consider the following dataset: [‘bab3’, ‘bc01’, ‘cc2’, ‘cd5’, ‘cd3’, ‘cdx2’, ‘cdx1’, ‘e01’, ‘g02’, ‘ha1’, ‘hb1’,...

1. Consider the following dataset:

[‘bab3’, ‘bc01’, ‘cc2’, ‘cd5’, ‘cd3’, ‘cdx2’, ‘cdx1’, ‘e01’, ‘g02’, ‘ha1’, ‘hb1’, ‘hc8’, ‘hz5’, ‘z00’, ‘z01’, ‘bc01’]

a. Assume a hashing function that makes an assignment based on the 1st symbol of the string. So ‘bab3’ goes into Bucket1 since it starts with ‘b’ and ‘cc1’ goes into Bucket2 since it starts from ‘c’. (Yes, it is a very crude hash function)

[a-b] -> Bucket1

[c-d] -> Bucket2

[e-f] -> Bucket3

[g-z] -> Bucket4

Why (or why not?) would you consider it a good hashing function? Please note that an answer of yes or no (without an explanation) will not be credited.

b. Design your own (good) hash function based on the given data and using exactly 5 buckets. In this case, the goodness of the function is measured based on load-balancing of the data.

c. Suppose that the input dataset is:

[‘a1’, ‘a1’, ‘b1’, ‘d1’, ‘a1’, ‘a1’, ‘b1’, ‘c1’, ‘a2’, ‘c1’, ‘c1’, ‘a1’, ‘d2’,’d1’].

How would you design a hash function to partition this data into 3 buckets? Once again the goodness of hash function is measured based on even distribution (as even as possible).

0 0
Add a comment Improve this question Transcribed image text
Answer #1

Now using the hashing function described above, the number of element in each bucket will be as follows:

Bucket1 -> [ ‘bab3’, ‘bc01’, ‘bc01’ ]: 3 elements

Bucket2 -> [ ‘cc2’, ‘cd5’, ‘cd3’, ‘cdx2’, ‘cdx1’] : 5 elements

Bucket3 -> [ ‘e01’]: 1 element

Bucket4 -> [ ‘g02’, ‘ha1’, ‘hb1’, ‘hc8’, ‘hz5’, ‘z00’, ‘z01’]: 7 elements

The hashing function is not good for the given data because the number of elements are not evenly distributed in each bucket. The Bucket3 contains only single element whereas Bucket4 contains 7 elements.

b. For the given data a better hash function would be(using exactly 5 Buckets as described in question):

map each string to a number by using following method:
map a to 1, b to 2 and so on... map z to 26.

map each number to same digit.

eg. for bab3: b will be 2 a will be 1 hence the sum will be 2 + 1 + 2 + 3 = 8.

hence the data:

[‘bab3’, ‘bc01’, ‘cc2’, ‘cd5’, ‘cd3’, ‘cdx2’, ‘cdx1’, ‘e01’, ‘g02’, ‘ha1’, ‘hb1’, ‘hc8’, ‘hz5’, ‘z00’, ‘z01’, ‘bc01’]

will become:

[8, 4, 8, 12, 10, 34, 33, 6, 9, 10, 11, 19, 38, 26, 27, 6]

[0-6] : bucket 1

[7-9] : bucket 2

[10-11]: bucket 3

[12-26]: bucket 4

[greater than 26]: bucket 5

Now the content of the buckets will be :

bucket 1: ['bc01', 'e01', 'bc01']

bucket 2: ['bab3' , 'cc2', 'g02']

bucket 3: ['cd3', 'ha1','hb1']

bucket 4: ['cd5', 'hc8' , 'z00']

bucket 5: ['z01','cdx1','cdx2','hz5']

Now here the content in the bucket are evenly distributed.

Hence it is a good hash function.

c.

for data:

[‘a1’, ‘a1’, ‘b1’, ‘d1’, ‘a1’, ‘a1’, ‘b1’, ‘c1’, ‘a2’, ‘c1’, ‘c1’, ‘a1’, ‘d2’,’d1’]

The above method can also be applied on this data.

Using the above method a good hash function for the given dataset will be:

[0-2] : bucket 1

[3-4] : bucket 2

[5 and above]: bucket 3

first the mapped data will be:

[2,2,3,5,2,2,3,4,3,4,4,2,6,5]

Bucket1: ['a1' ,'a1' ,'a1', 'a1' , 'a1']

Bucket2: ['a2', 'b1', 'b1', 'c1', 'c1', 'c1']

Bucket3: ['d1','d1','d2']

The buckets are not as evenly distributed as the in the first question. But still it is a good distribution of the elements.

Add a comment
Know the answer?
Add Answer to:
1. Consider the following dataset: [‘bab3’, ‘bc01’, ‘cc2’, ‘cd5’, ‘cd3’, ‘cdx2’, ‘cdx1’, ‘e01’, ‘g02’, ‘ha1’, ‘hb1’,...
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for? Ask your own homework help question. Our experts will answer your question WITHIN MINUTES for Free.
Similar Homework Help Questions
ADVERTISEMENT
Free Homework Help App
Download From Google Play
Scan Your Homework
to Get Instant Free Answers
Need Online Homework Help?
Ask a Question
Get Answers For Free
Most questions answered within 3 hours.
ADVERTISEMENT
ADVERTISEMENT