25: parallel algorithms #2

Batcher's Even-Odd Sort | Parallel Merge | Sorting Networks | Counting Networks
Parallel Prefix Sum | Stream Compaction | Peer-to-Peer Overlay | DHT | p2pstm

Batchers Even-Odd Sort [1]

Batchers even-odd sort is based on compare_exchange operations. Algorithms like this are non-adaptive algorithms and can be written as a "straight-line" sequence of instructions.

Bubble sort of 3 numbers using only compare exchange:
1:  
// bubbleCompareExchange.cpp - download here

2:  

3:  
#include
 
<
vector>

4:  
#include
 
<
iostream>

5:  

6:  
void
 
compare_exchange(
std:
:
vector<
int
>
&
 
vec, 
int
 
index1, 
int
 
index2)
{

7:  
 
 
int
 
lhs 
=
 
vec[
index1]
;

8:  
 
 
int
 
rhs 
=
 
vec[
index2]
;

9:  
 
 
if
(
rhs 
<
 
lhs)
{

10: 
 
 
 
 
vec[
index1]
 
=
 
rhs;

11: 
 
 
 
 
vec[
index2]
 
=
 
lhs;

12: 
 
 
}

13: 
}

14: 

15: 
void
 
bubble3(
std:
:
vector<
int
>
&
 
vec)
{

16: 
 
 
compare_exchange(
vec, 
0, 
1)
;

17: 
 
 
compare_exchange(
vec, 
0, 
2)
;

18: 
 
 
compare_exchange(
vec, 
1, 
2)
;

19: 
}

20: 

21: 
int
 
main(
int
 
argc, 
char
 
argv[
]
)
{

22: 

23: 
 
 
std:
:
vector<
int
>
 
to_sort;

24: 
 
 
to_sort.push_back(
3)
;

25: 
 
 
to_sort.push_back(
2)
;

26: 
 
 
to_sort.push_back(
1)
;

27: 

28: 
 
 
bubble3(
to_sort)
;

29: 

30: 
 
 
for
(
size_t
 
=
 
0;
 
<
 
to_sort.size(
)
;
 
++i)
{

31: 
 
 
 
 
std:
:
cout 
<
<
 
to_sort[
i]
 
<
<
 
std:
:
endl;

32: 
 
 
}

33: 

34: 
 
 
//prints:

35: 
 
 
//  1

36: 
 
 
//  2

37: 
 
 
//  3

38: 

39: 
 
 
return
 
0;

40: 
}

In this version of batchers even-odd sort, the number of processors we have is N / 2.
This is a parallel comparison sort that takes O(n) time (assuming you have N/2 processors).
1:  
// gpuBatchers.cpp - download here

2:  

3:  
#include
 
<
vector>

4:  
#include
 
<
iostream>

5:  

6:  
void
 
compare_exchange(
std:
:
vector<
int
>
&
 
vec, 
int
 
left, 
int
 
right)
{

7:  
 
 
int
 
lhs 
=
 
vec[
left]
;

8:  
 
 
int
 
rhs 
=
 
vec[
right]
;

9:  
 
 
if
(
rhs 
<
 
lhs)
{

10: 
 
 
 
 
vec[
left]
 
=
 
rhs;

11: 
 
 
 
 
vec[
right]
 
=
 
lhs;

12: 
 
 
}

13: 
}

14: 

15: 
void
 
batchersEvenOdd(
std:
:
vector<
int
>
&
 
vec, 
int
 
procs)
{

16: 
 
 
int
 
num 
=
 
vec.size(
)
;

17: 
 
 
for
(
int
 
=
 
0;
 
<
 
num;
 
++i)
{

18: 
 
 
 
 
for
(
int
 
proc 
=
 
0;
 
proc 
<
 
procs;
 
++proc)
{

19: 
 
 
 
 
 
 
int
 
left;

20: 
 
 
 
 
 
 
int
 
right;

21: 
 
 
 
 
 
 
if
(
=
=
 
0)
{

22: 
 
 
 
 
 
 
 
 
left 
=
 
proc;

23: 
 
 
 
 
 
 
 
 
right 
=
 
(
proc)
 
1;

24: 
 
 
 
 
 
 
}
 
else
 
{

25: 
 
 
 
 
 
 
 
 
left 
=
 
(
proc)
 
1;

26: 
 
 
 
 
 
 
 
 
right 
=
 
(
proc)
 
2;

27: 
 
 
 
 
 
 
}

28: 
 
 
 
 
 
 
if
(
right 
<
 
num)
{

29: 
 
 
 
 
 
 
 
 
compare_exchange(
vec, 
left, 
right)
;

30: 
 
 
 
 
 
 
}

31: 
 
 
 
 
}

32: 
 
 
}

33: 
}

34: 

35: 
int
 
main(
int
 
argc, 
char
 
argv[
]
)
{

36: 

37: 
 
 
std:
:
vector<
int
>
 
vec;

38: 
 
 
int
 
size 
=
 
8;

39: 
 
 
for
(
int
 
=
 
size-1;
 
>
=
 
0;
 
--i)
{

40: 
 
 
 
 
vec.push_back(
i)
;

41: 
 
 
}

42: 

43: 
 
 
std:
:
cout 
<
<
 
"values: "
 
<
<
 
std:
:
endl;

44: 
 
 
for
(
size_t
 
=
 
0;
 
<
 
vec.size(
)
;
 
++i)
{

45: 
 
 
 
 
std:
:
cout 
<
<
 
vec[
i]
 
<
<
 
" "
;

46: 
 
 
}

47: 
 
 
std:
:
cout 
<
<
 
std:
:
endl;

48: 

49: 
 
 
batchersEvenOdd(
vec, 
size 
2)
;

50: 

51: 
 
 
std:
:
cout 
<
<
 
"values: "
 
<
<
 
std:
:
endl;

52: 
 
 
for
(
size_t
 
=
 
0;
 
<
 
vec.size(
)
;
 
++i)
{

53: 
 
 
 
 
std:
:
cout 
<
<
 
vec[
i]
 
<
<
 
" "
;

54: 
 
 
}

55: 
 
 
std:
:
cout 
<
<
 
std:
:
endl;

56: 

57: 
 
 
//prints:

58: 
 
 
//  values: 

59: 
 
 
//  7 6 5 4 3 2 1 0 

60: 
 
 
//  values: 

61: 
 
 
//  0 1 2 3 4 5 6 7 

62: 
 

63: 
 
 
return
 
0;

64: 
}

12
10
11
14
13
20
17
16




10
12
11
14
13
20
16
17




10
11
12
13
14
16
20
17




10
11
12
13
14
16
17
20




Parallel Merge

Due to practical reasons, a Batchers Even-Odd Sort cannot be done on an entire large array sometimes and an efficient parallel merge is needed.

Each of the P processors can view the entire array. The array is segmented into P sorted sections. In parallel, each processor computes the rank of all of its elements. After, on one core, all the elements are placed into their final position.

Example of ranks computed by 4 processors:

value
10
11
15
6
8
12
20
23
50
9
16
90

rank
3
4
6
0
2
5
8
9
10
1
7
11







Algorithm:
1:  
algorithm
 
computeRank(
vector 
vec, 
int
 
left, 
int
 
right)

2:  
 
 
vector 
ret;

3:  
 
 
size 
right 
left

4:  
 
 
for
 
left 
to 
right

5:  
 
 
 
 
int
 
rank 
i;

6:  
 
 
 
 
int
 
key 
vec[
i]
;

7:  
 
 
 
 
for
 
offset 
size 
to 
vec.size(
)
 
increment 
by 
size 
then 
wrap 
around 
until 
left 
is 
hit

8:  
 
 
 
 
 
 
for
 
to 
size

9:  
 
 
 
 
 
 
 
 
if
 
vec[
offset+j]
 
<
 
key

10: 
 
 
 
 
 
 
 
 
 
 
++rank;

11: 
 
 
 
 
 
 
 
 
else
 

12: 
 
 
 
 
 
 
 
 
 
 
break
 
out 
of 
loop

Sorting Network [1]

A simple sorting network has comparaters at each vertice line. Two inputs come into the left and the smaller is output on top and the larger is output on bottom.
SortingNetwork
Larger sorting network (Batcher's even-odd merging network)
SortingNetwork

Block Merge with Sorting Network [1]

Each merge node merges the block and then puts the smaller values half on the top and the larger values half on the bottom.
MergingNetwork

Counting Network [5]

In counting networks, a process puts a value on any wire and the value is propegated through the network. Each balancer alternates the output wire that the value will go to. This is useful for low contention counting inside hardware. In this figure each balancer starts in the up position.
CountingNetwork
This counting network is very similar to a work distribution scheme using queues.

Parallel Prefix Sum [6,7]

Prefix sum calculates the sum of all elements before an element in the array, plus itself.
PrefixSum
(Image from wikipedia [6])

value
6
5
4
3
10
12
1
4
3
9
7
8

prefix sum
6
11
15
18
28
40
41
44
53
60
67
75







Prefix sum can be used in a version of counting sort that is stable with key/value pairs.
1:  
algorithm
 
prefixSumCountingSort(
array 
a)
{

2:  
 
 
int
 
max_value(
a)
;

3:  
 
 
array 
new
 
array[
a.length]
;

4:  
 
 
array 
new
 
array[
k]
;

5:  
 
 
for
(
0;
 
<
 
k;
 
++i)
{

6:  
 
 
 
 
c[
i]
 
0;

7:  
 
 
}

8:  
 
 
for
(
0;
 
<
 
a.length;
 
++i)
{

9:  
 
 
 
 
c[
a[
i]
]
 
+= 
1;

10: 
 
 
}

11: 
 
 
calculate_prefix_sum(
c)
;

12: 
 
 
for
(
a.length 
1;
 
>
0;
 
--i)
{

13: 
 
 
 
 
b[
c[
a[
i]
]
 
1]
 
a[
i]
;

14: 
 
 
 
 
c[
a[
i]
]
 
-= 
1;

15: 
 
 
}
 

16: 
}

orig array
2
5
3
0
2
3
0
3
counts
2
0
2
3
0
1
 
 
prefix sum
2
2
4
7
7
8
 
 







  1. Start:

    out array
    0
    0
    0
    0
    0
    0
    0
    0
    prefix sum
    2
    2
    4
    7
    7
    8
     
     






  2. a[length-1] = 3. c[3] - 1 = 6. --c[3] = 6.

    out array
    0
    0
    0
    0
    0
    0
    3
    0
    prefix sum
    2
    2
    4
    6
    7
    8
     
     






  3. a[length-2] = 0. c[0] - 1 = 1. --c[0] = 1.

    out array
    0
    0
    0
    0
    0
    0
    3
    0
    prefix sum
    1
    2
    4
    6
    7
    8
     
     






  4. a[length-3] = 3. c[3] - 1 = 5. --c[3] = 5.

    out array
    0
    0
    0
    0
    0
    3
    3
    0
    prefix sum
    1
    2
    4
    5
    7
    8
     
     






  5. a[length-4] = 2. c[2] - 1 = 3. --c[2] = 3.

    out array
    0
    0
    0
    2
    0
    3
    3
    0
    prefix sum
    1
    2
    3
    5
    7
    8
     
     






  6. a[length-5] = 0. c[0] - 1 = 0. --c[2] = 0.

    out array
    0
    0
    0
    2
    0
    3
    3
    0
    prefix sum
    0
    2
    3
    5
    7
    8
     
     






  7. a[length-6] = 3. c[3] - 1 = 4. --c[2] = 4.

    out array
    0
    0
    0
    2
    3
    3
    3
    0
    prefix sum
    0
    2
    3
    4
    7
    8
     
     






  8. a[length-7] = 5. c[5] - 1 = 7. --c[2] = 7.

    out array
    0
    0
    0
    2
    3
    3
    3
    5
    prefix sum
    0
    2
    3
    4
    7
    7
     
     






  9. a[length-7] = 2. c[2] - 1 = 2. --c[2] = 2.

    out array
    0
    0
    2
    2
    3
    3
    3
    5
    prefix sum
    0
    2
    2
    4
    7
    7
     
     






  • Prefix sum time complexity: 2*lgn or O(lgn)
  • Stable Serial Counting Sort time complexity: 2*k + 2*n = O(k+n)
  • Stable Parallel Counting Sort time complexity: k + 2*lgk + 2*n = O(k+n)
  • The parallel version can have a faster wall clock time assuming k/2 processors, minimal communication time and equal cache effects
Also note that a stable counting sort can be used as a sorting function in radix sort.
1:  
struct
 
SortPair 
{

2:  
 
 
int
 
digit;

3:  
 
 
int
 
array_element;

4:  
}
;

5:  

6:  
algorithm
 
countingRadixSort(
array 
a)

7:  
 
 
int
 
max_digits_in_any_element(
a)
;

8:  
 
 
array 
sort_pairs 
new
 
array[
a.length]
;

9:  
 
 
for
(
0;
 
<
 
a.length;
 
++j)

10: 
 
 
 
 
sort_pairs[
j]
.array_element 
a[
j]
;

11: 
 
 
for
(
0;
 
<
 
d;
 
++i)

12: 
 
 
 
 
for
(
0;
 
<
 
sort_pairs.length;
 
++j)

13: 
 
 
 
 
 
 
sort_pairs[
j]
.digit 
extract_digit(
i, 
sort_pairs[
j]
.array_element)
;

14: 
 
 
 
 
use 
counting 
sort 
to 
order 
sort_pairs 
based 
on 
digit 
(
ranges 
0-9)
;
 
 
 
 

Parallel Prefix Sum Stream Compaction [8]

In parallel on a graphics card, some objects need to be processed, some need to be skipped.

Stream
A
B
C
D
E
F
G
H
Keep Nodes
1
0
1
1
0
0
1
0
Prefix Sum
0
1
1
2
3
3
3
4
Result
A
C
D
G










Peer-to-Peer Network Overlay

The napster music sharing service created a new research topic in computer science: Fully decentralized, fault tolerant, scalable networking. Napster itself used centralized servers to index what songs where on what computers.

Peer-to-Peer Network Overlays were created in academia to solve the indexing problems napster had.

Today BitTorrent [2], the Storm Botnet [2] and Skype are based on advances in peer-to-peer network overlays.

FreePastry is one of the peer-to-peer network overlays from academia:

p2p_ring
(Image from [4])

A node joins the ring by contacting a node in the ring by IP address. A routing table is kept at each node. At each hop in a message being routed, it goes to a node that has at least one digit in the node-id closer.

Messages are guarenteed to be routed in O(lgn) hops.

Distributed Hash Table

A distributed hash table is a hash table that uses a peer-to-peer network overlay. To store a file, a hash of the filename is produced. The file is then sent to the closest node-id of that hash.
  • The file is replicated on k nearest nodes.
  • If the primary computer that the file is stored on exits the network, next time the file is requested, the message will go to the now closest node. Since the file was replicated, the new closest node also has the exact file.
  • When a new closest node notices that serves a file, it can again replicate the file to k nearest nodes.
  • Once a file is in the system, it can never be changed due to concurrency issues.

p2pstm [3]

p2pstm solves the problem of a distributed hash table where you can only write data. It provides a software transactional memory concurrency control abstraction rather than a locked based abstraction.

Lock Based Concurrency Control

Our bank account concurrency problems from the previous lecture can be solved with locks:
1:  
// bankAccountLocks.cpp - download here

2:  

3:  
#include
 
<
iostream>

4:  
#include
 
<
pthread.h>

5:  
#include
 
<
unistd.h>

6:  
#include
 
<
cstdlib>

7:  

8:  
class
 
Account 
{

9:  
public
:

10: 
 
 
Account(
bool
 
use_locking)
;

11: 
 
 
~Account(
)
;

12: 
 
 
int
 
getBalance(
)
;

13: 
 
 
void
 
setBalance(
int
 
value)
;

14: 
 
 
void
 
acquireLock(
)
;

15: 
 
 
void
 
releaseLock(
)
;

16: 
private
:

17: 
 
 
int
 
m_Balance;

18: 
 
 
pthread_mutex_t 
m_Mutex;

19: 
 
 
bool
 
m_UseLocking;

20: 
}
;

21: 

22: 
Account:
:
Account(
bool
 
use_locking)

23: 
 
:
 
m_Balance(
0)
m_UseLocking(
use_locking)

24: 
{

25: 
 
 
pthread_mutex_init(
&
m_Mutex, 
NULL)
;

26: 
}

27: 

28: 
Account:
:
~Account(
)

29: 
{

30: 
 
 
pthread_mutex_destroy(
&
m_Mutex)
;

31: 
}

32: 

33: 
int
 
Account:
:
getBalance(
)

34: 
{

35: 
 
 
return
 
m_Balance;

36: 
}

37: 

38: 
void
 
Account:
:
setBalance(
int
 
value)

39: 
{

40: 
 
 
m_Balance 
=
 
value;

41: 
}

42: 

43: 
void
 
Account:
:
acquireLock(
)

44: 
{

45: 
 
 
if
(
m_UseLocking)
{

46: 
 
 
 
 
pthread_mutex_lock(
&
m_Mutex)
;

47: 
 
 
}

48: 
}

49: 

50: 
void
 
Account:
:
releaseLock(
)

51: 
{

52: 
 
 
if
(
m_UseLocking)
{

53: 
 
 
 
 
pthread_mutex_unlock(
&
m_Mutex)
;

54: 
 
 
}

55: 
}

56: 

57: 
void
 
threadProc(
void
 
data)
{

58: 

59: 
 
 
Account 
account 
=
 
(
Account 
*)
 
data;

60: 
 
 
for
(
int
 
=
 
0;
 
<
 
100000;
 
++i)
{

61: 
 
 
 
 
account->
acquireLock(
)
;

62: 
 
 
 
 
int
 
bal 
=
 
account->
getBalance(
)
;

63: 
 
 
 
 
bal 
+=
 
20;

64: 
 
 
 
 
account->
setBalance(
bal)
;

65: 
 
 
 
 
account->
releaseLock(
)
;

66: 
 
 
}

67: 
}

68: 

69: 
int
 
main(
int
 
argc, 
char
 
argv[
]
)

70: 
{

71: 

72: 
 
 
const
 
int
 
num_threads 
=
 
2;

73: 
 
 
pthread_t 
threads1[
num_threads]
;

74: 
 
 
pthread_t 
threads2[
num_threads]
;

75: 
 
 
Account 
account1(
true)
;
 

76: 
 
 
Account 
account2(
false)
;

77: 

78: 
 
 
for
(
int
 
=
 
0;
 
<
 
num_threads;
 
++i)
{

79: 
 
 
 
 
pthread_create(
&
threads1[
i]
NULL, 
&
threadProc, 
&
account1)
;

80: 
 
 
 
 
pthread_create(
&
threads2[
i]
NULL, 
&
threadProc, 
&
account2)
;

81: 
 
 
}

82: 

83: 
 
 
for
(
int
 
=
 
0;
 
<
 
num_threads;
 
++i)
{

84: 
 
 
 
 
pthread_join(
threads1[
i]
NULL)
;

85: 
 
 
 
 
pthread_join(
threads2[
i]
NULL)
;

86: 
 
 
}

87: 

88: 
 
 
std:
:
cout 
<
<
 
"locked final balance: "
 
<
<
 
account1.getBalance(
)
 
<
<
 
std:
:
endl;

89: 
 
 
std:
:
cout 
<
<
 
"unlocked final balance: "
 
<
<
 
account2.getBalance(
)
 
<
<
 
std:
:
endl;

90: 
 
 
//prints:

91: 
 
 
//  locked final balance: 4000000

92: 
 
 
//  unlocked final balance: 2583880

93: 

94: 
 
 
return
 
0;

95: 
}

Software Transactional Memory Concurrency Control

In STM, no locks are used. For each variable, a version number is kept and the idea of a transaction is used.
1: 
algorithm
 
updateBalanceThreadProc(
account)

2: 
 
 
atomic
 
{

3: 
 
 
 
 
int
 
bal 
account.getBalance(
)
;

4: 
 
 
 
 
bal 
+= 
20;

5: 
 
 
 
 
account.setBalance(
bal)
;

6: 
 
 
}

p2pstm algorithm

Each variable in the program is represented by a (keyword:version) pair. The distributed memory addresses are computed by transforming the (keyword:version) 2-dimensional pair into a 1-dimensional node-id using Hilbert Space Filling Curves.
p2p_hilbert
Then the following algorithm is used. The reads and writes to the distributed memory are done in a sorted order of the (keyword:version) pair to prevent deadlocks.
1:  
algorithm
 
p2pstmCommit(
transaction)

2:  
 
 
while
(
not 
committed)

3:  
 
 
 
 
read 
all 
the 
variables 
and 
versions 
in 
the 
transaction;

4:  
 
 
 
 
executed 
the 
transaction 
on 
the 
variables;

5:  
 
 
 
 
read 
the 
versions 
again;

6:  
 
 
 
 
write 
the 
variables 
and 
versions 
to 
the 
distributed 
memory;

7:  
 
 
 
 
if
(
any 
write 
failed)

8:  
 
 
 
 
 
 
restart 
the 
transaction;

9:  
 
 
 
 
send 
validations 
for
 
each 
variable 
and 
version 
to 
the 
distributed 
memory;

10: 
 
 
 
 
if
(
and 
validation 
failed)

11: 
 
 
 
 
 
 
restart 
the 
transaction;

12: 
 
 

Example
1:  
Client 
Node 
1:

2:  
atomic
 
transaction:
 
{

3:  
 
 
add 
name1 
to 
authors;

4:  
 
 
add 
book_uuid1 
to 
books;

5:  
 
 
add 
book_uuid1 
to 
compsci_books;

6:  
}

7:  

8:  
Client 
Node 
2:

9:  
atomic
 
transaction:
 
{

10: 
 
 
add 
name2 
to 
authors;

11: 
 
 
add 
book_uuid2 
to 
books;

12: 
 
 
add 
book_uuid2 
to 
compsci_books;

13: 
}

Operations
1:  
t1.read(
authors)
;

2:  
t1.read(
books)
;

3:  
t1.read(
compsci_books)
;

4:  
t1.exec_transaction(
)
;

5:  
t1.write(
authors)
;

6:  
t1.write(
books)
;

7:  
t1.write(
compsci_books)
;

8:  
t1.validate(
authors)
;

9:  
t1.validate(
books)
;

10: 
t1.validate(
compsci_books)
;

1:  
t2.read(
authors)
;

2:  
t2.read(
books)
;

3:  
t2.read(
compsci_books)
;

4:  
t2.exec_transaction(
)
;

5:  
t2.write(
authors)
;

6:  
t2.write(
books)
;

7:  
t2.write(
compsci_books)
;

8:  
t2.validate(
authors)
;

9:  
t2.validate(
books)
;

10: 
t2.validate(
compsci_books)
;


Execution Possibility #1
1:  
t1.read(
authors)
;

2:  
 
 
local 
authors 
version 
1

3:  
t1.read(
books)
;

4:  
 
 
local 
books 
version 
1

5:  
t1.read(
compsci_books)
;

6:  
 
 
local 
compsci_books 
version 
1

7:  

8:  
t2.read(
authors)
;

9:  
 
 
local 
authors 
version 
1

10: 
t2.read(
books)
;

11: 
 
 
local 
books 
version 
1

12: 
t2.read(
compsci_books)
;

13: 
 
 
local 
compsci_books 
version 
1

14: 

15: 
t1.exec_transaction(
)
;

16: 
 
 
local 
authors 
version 
2

17: 
 
 
local 
books 
version 
2

18: 
 
 
local 
compsci_books 
version 
2

19: 

20: 
t2.exec_transaction(
)
;

21: 
 
 
local 
authors 
version 
2

22: 
 
 
local 
books 
version 
2

23: 
 
 
local 
compsci_books 
version 
2

24: 

25: 
t1.write(
authors)
;

26: 
 
 
success. 
reserved 
remote 
authors 
version 
2

27: 

28: 
t2.write(
authors)
;

29: 
 
 
failure. 
remote 
authors 
version 
already 
reseverd

30: 
t2.restart_transaction(
)
;

31: 

32: 
t1.write(
books)
;

33: 
 
 
success. 
reserved 
remote 
books 
version 
2

34: 
t1.write(
compsci_books)
;

35: 
 
 
success. 
reserved 
remote 
compsci_books 
version 
2

36: 

37: 
t1.validate(
authors)
;

38: 
 
 
success. 
final 
remote 
authors 
version 
2

39: 
t1.validate(
books)
;

40: 
 
 
success. 
final 
remote 
authors 
version 
2

41: 
t1.validate(
compsci_books)
;

42: 
 
 
success. 
final 
remote 
authors 
version 
2

Execution Possibility #2
1:  
t1.read(
authors)
;

2:  
 
 
local 
authors 
version 
1

3:  
t1.read(
books)
;

4:  
 
 
local 
books 
version 
1

5:  
t1.read(
compsci_books)
;

6:  
 
 
local 
compsci_books 
version 
1

7:  

8:  
t2.read(
authors)
;

9:  
 
 
local 
authors 
version 
1

10: 
t2.read(
books)
;

11: 
 
 
local 
books 
version 
1

12: 
t2.read(
compsci_books)
;

13: 
 
 
local 
compsci_books 
version 
1

14: 
t2.read(
elec_books)
;

15: 
 
 
local 
elec_books 
version 
1

16: 

17: 
t1.exec_transaction(
)
;

18: 
 
 
local 
authors 
version 
2

19: 
 
 
local 
books 
version 
2

20: 
 
 
local 
compsci_books 
version 
2

21: 

22: 
t2.exec_transaction(
)
;

23: 
 
 
local 
authors 
version 
2

24: 
 
 
local 
books 
version 
2

25: 
 
 
local 
compsci_books 
version 
2

26: 

27: 
t1.write(
authors)
;

28: 
 
 
success. 
reserved 
remote 
authors 
version 
2

29: 

30: 
t2.write(
authors)
;

31: 
 
 
failure. 
remote 
authors 
version 
already 
reseverd

32: 
t2.restart_transaction(
)
;

33: 

34: 
t2.read(
authors)
;

35: 
 
 
local 
authors 
version 
2

36: 
t2.read(
books)
;

37: 
 
 
local 
books 
version 
1

38: 
t2.read(
compsci_books)
;

39: 
 
 
local 
compsci_books 
version 
1

40: 

41: 
t2.exec_transaction(
)
;

42: 
 
 
local 
authors 
version 
3

43: 
 
 
local 
books 
version 
2

44: 
 
 
local 
compsci_books 
version 
2

45: 

46: 
t2.write(
authors)

47: 
 
 
success. 
reserved 
remote 
books 
version 
3

48: 
t2.write(
books)
;

49: 
 
 
success. 
reserved 
remote 
books 
version 
2

50: 
t2.write(
compsci_books)
;

51: 
 
 
success. 
reserved 
remote 
compsci_books 
version 
2

52: 

53: 
t2.validate(
authors)
;

54: 
 
 
success. 
final 
remote 
authors 
version 
3

55: 
t2.validate(
books)
;

56: 
 
 
success. 
final 
remote 
authors 
version 
2

57: 
t2.validate(
compsci_books)
;

58: 
 
 
success. 
final 
remote 
authors 
version 
2


There are more execution possibilities, but the partial ordering of (keyword,version) pairs guarentees there are no deadlocks in all cases.

References

  1. Robert Sedgewick, Algorithms in C++ Parts 1-4, pages 453 to 483
  2. http://en.wikipedia.org/wiki/Distributed_hash_table
  3. Pratt-Szeliga, Phil and Fawcett, Jim, "p2pstm: A Peer-to-Peer Software Transactional Memory" (2010). Electrical Engineering and Computer Science. Paper 170. http://surface.syr.edu/eecs/170
  4. A. Rowstron and P. Druschel, "Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems". IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), Heidelberg, Germany, pages 329-350, November, 2001.
  5. http://www.cs.yale.edu/homes/aspnes/papers/ahs.pdf
  6. http://en.wikipedia.org/wiki/Prefix_sum
  7. Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2001), "8.2 Counting Sort", Introduction to Algorithms (2nd ed.), MIT Press and McGraw-Hill, pp. 168–170, ISBN 0-262-03293-7.
  8. http://http.developer.nvidia.com/GPUGems3/gpugems3_ch39.html