1. Consider the following dataset: [‘bab3’, ‘bc01’, ‘cc2’, ‘cd5’, ‘cd3’, ‘cdx2’, ‘cdx1’, ‘e01’, ‘g02’, ‘ha1’, ‘hb1’,...

Question

Question

1. Consider the following dataset: [‘bab3’, ‘bc01’, ‘cc2’, ‘cd5’, ‘cd3’, ‘cdx2’, ‘cdx1’, ‘e01’, ‘g02’, ‘ha1’, ‘hb1’,...

1. Consider the following dataset:

[‘bab3’, ‘bc01’, ‘cc2’, ‘cd5’, ‘cd3’, ‘cdx2’, ‘cdx1’, ‘e01’, ‘g02’, ‘ha1’, ‘hb1’, ‘hc8’, ‘hz5’, ‘z00’, ‘z01’, ‘bc01’]

a. Assume a hashing function that makes an assignment based on the 1^st symbol of the string. So ‘bab3’ goes into Bucket1 since it starts with ‘b’ and ‘cc1’ goes into Bucket2 since it starts from ‘c’. (Yes, it is a very crude hash function)

[a-b] -> Bucket1

[c-d] -> Bucket2

[e-f] -> Bucket3

[g-z] -> Bucket4

Why (or why not?) would you consider it a good hashing function? Please note that an answer of yes or no (without an explanation) will not be credited.

b. Design your own (good) hash function based on the given data and using exactly 5 buckets. In this case, the goodness of the function is measured based on load-balancing of the data.

c. Suppose that the input dataset is:

[‘a1’, ‘a1’, ‘b1’, ‘d1’, ‘a1’, ‘a1’, ‘b1’, ‘c1’, ‘a2’, ‘c1’, ‘c1’, ‘a1’, ‘d2’,’d1’].

How would you design a hash function to partition this data into 3 buckets? Once again the goodness of hash function is measured based on even distribution (as even as possible).

engineering Computer-Science

Add a comment Improve this question Transcribed image text

Answer 1

Answer #1

Now using the hashing function described above, the number of element in each bucket will be as follows:

Bucket1 -> [ ‘bab3’, ‘bc01’, ‘bc01’ ]: 3 elements

Bucket2 -> [ ‘cc2’, ‘cd5’, ‘cd3’, ‘cdx2’, ‘cdx1’] : 5 elements

Bucket3 -> [ ‘e01’]: 1 element

Bucket4 -> [ ‘g02’, ‘ha1’, ‘hb1’, ‘hc8’, ‘hz5’, ‘z00’, ‘z01’]: 7 elements

The hashing function is not good for the given data because the number of elements are not evenly distributed in each bucket. The Bucket3 contains only single element whereas Bucket4 contains 7 elements.

b. For the given data a better hash function would be(using exactly 5 Buckets as described in question):

map each string to a number by using following method:
map a to 1, b to 2 and so on... map z to 26.

map each number to same digit.

eg. for bab3: b will be 2 a will be 1 hence the sum will be 2 + 1 + 2 + 3 = 8.

hence the data:

[‘bab3’, ‘bc01’, ‘cc2’, ‘cd5’, ‘cd3’, ‘cdx2’, ‘cdx1’, ‘e01’, ‘g02’, ‘ha1’, ‘hb1’, ‘hc8’, ‘hz5’, ‘z00’, ‘z01’, ‘bc01’]

will become:

[8, 4, 8, 12, 10, 34, 33, 6, 9, 10, 11, 19, 38, 26, 27, 6]

[0-6] : bucket 1

[7-9] : bucket 2

[10-11]: bucket 3

[12-26]: bucket 4

[greater than 26]: bucket 5

Now the content of the buckets will be :

bucket 1: ['bc01', 'e01', 'bc01']

bucket 2: ['bab3' , 'cc2', 'g02']

bucket 3: ['cd3', 'ha1','hb1']

bucket 4: ['cd5', 'hc8' , 'z00']

bucket 5: ['z01','cdx1','cdx2','hz5']

Now here the content in the bucket are evenly distributed.

Hence it is a good hash function.

c.

for data:

[‘a1’, ‘a1’, ‘b1’, ‘d1’, ‘a1’, ‘a1’, ‘b1’, ‘c1’, ‘a2’, ‘c1’, ‘c1’, ‘a1’, ‘d2’,’d1’]

The above method can also be applied on this data.

Using the above method a good hash function for the given dataset will be:

[0-2] : bucket 1

[3-4] : bucket 2

[5 and above]: bucket 3

first the mapped data will be:

[2,2,3,5,2,2,3,4,3,4,4,2,6,5]

Bucket1: ['a1' ,'a1' ,'a1', 'a1' , 'a1']

Bucket2: ['a2', 'b1', 'b1', 'c1', 'c1', 'c1']

Bucket3: ['d1','d1','d2']

The buckets are not as evenly distributed as the in the first question. But still it is a good distribution of the elements.

Add a comment

Answer 2

1. Consider the following dataset: [‘bab3’, ‘bc01’, ‘cc2’, ‘cd5’, ‘cd3’, ‘cdx2’, ‘cdx1’, ‘e01’, ‘g02’, ‘ha1’, ‘hb1’,...

Homework Answers

Add Answer to:
1. Consider the following dataset: [‘bab3’, ‘bc01’, ‘cc2’, ‘cd5’, ‘cd3’, ‘cdx2’, ‘cdx1’, ‘e01’, ‘g02’, ‘ha1’, ‘hb1’,...

Post as a guest

Earn Coins

1. Consider the following dataset: [‘bab3’, ‘bc01’, ‘cc2’, ‘cd5’, ‘cd3’, ‘cdx2’, ‘cdx1’, ‘e01’, ‘g02’, ‘ha1’, ‘hb1’,...

Homework Answers

Add Answer to: 1. Consider the following dataset: [‘bab3’, ‘bc01’, ‘cc2’, ‘cd5’, ‘cd3’, ‘cdx2’, ‘cdx1’, ‘e01’, ‘g02’, ‘ha1’, ‘hb1’,...

Post as a guest

Earn Coins

Add Answer to:
1. Consider the following dataset: [‘bab3’, ‘bc01’, ‘cc2’, ‘cd5’, ‘cd3’, ‘cdx2’, ‘cdx1’, ‘e01’, ‘g02’, ‘ha1’, ‘hb1’,...