16: hashing #2

Perfect Hashing

Perfect Hashing

Perfect Hashing asks the question: If you are given all of the strings to hash at once, can you create a hash table that guarentees no collisions?

The answer is, yes you can, using random graphs.

Mapping Step
repeat
  G = empty_graph;
  randomly generate tables T1 and T2;
  foreach(word in words)
    f1(w) = applyHash(word, T1) % n;
    f2(w) = applyHash(word, T2) % n;
    add the undirected edge (f1(w), f2(w)) to graph G;
  end loop;
until G is acyclic with no self loops and no duplicate edges;

procedure applyHash(word : string, randomTable : vector<int>);
  ret = 0;
  for(i = 0; i < word.size(); ++i)
    c = word[i];
    mult = randomTable[i];
    ret += (c * mult);
  end loop;
  return ret % n;
end traverse;
Assignment Step
begin
  visited = false for all vertices;
  functionG = 0 for all vertices;
  foreach(vertex v in vertices)
    if not visited[v] then
      traverse(v);
    end if;
  end loop;
end

procedure traverse(u : vertex);
begin
  visited[u] = TRUE;
  foreach(vertex w in neighbors(u))
    if not visited[w] then
      functionG(w) = word_index[w] - functionG(u) % m;
      traverse(w);
    end if;
  end loop;
end traverse;
Table1
10
33
14
Table2
12
1
31
m = words.size()
n = 3 * m

perfect hash


hash1("JAN") = ((10 * 74) + (33 * 65) + (14 * 78)) % n = 3977 % 36 = 17
hash2("JAN") = ((12 * 74) + (1 * 65) + (31 * 78)) % n = 3371 % 36 = 23


words
indexword
0JAN
1FEB
2MAR
3APR
4MAY
5JUN
6JUL
7AUG
8SEP
9OCT
10NOV
11DEC
functionG
indexvalue
00
10
20
30
40
50
60
70
80
91
1011
110
120
131
140
152
160
176
18-8
199
200
214
220
23-6
240
250
260
270
280
295
300
312
ASCII Table
A     65     
B66
C67
D68
E69
F70
G71
H72
I73
J74
K75
L76
M77
N78
O79
P80
Q81
R82
S83
T84
U85
V86
W87
X88
Y89
Z90

References

  1. http://cmph.sourceforge.net/papers/chm92.pdf