I'm trying to write a thread-safe sorted single linked list. I wrote two versions: coarse grained synchronization and fine grained synchronization. Here are the two implementations:
Fine grained:
public void add(T t) {
Node curr = head;
curr.lock.lock();
while (curr.next != null) {
// Invariant: curr is locked
// Invariant: curr.data < t
curr.next.lock.lock();
if (t.compareTo(curr.next.data) <= 0) {
break;
}
Node tmp = curr.next;
curr.lock.unlock();
curr = tmp;
}
// curr is acquired
curr.next = new Node(curr.next, t);
if (curr.next.next != null) { // old curr's next is acquired
curr.next.next.lock.unlock();
}
curr.lock.unlock();
}
Coarse grained:
public void add(T t) {
lock.lock();
Node curr = head;
while (curr.next != null) {
if (t.compareTo(curr.next.data) <= 0) {
break;
}
curr = curr.next;
}
curr.next = new Node(curr.next, t);
lock.unlock();
}
I timed the two version on 4 threads (on 4 logical CPU cores) inserting 20000 integers. The time per thread shows CPU time (i.e. it does not include waiting time).
Fine grained:
Worked 1 spent 1080 ms
Worked 2 spent 1230 ms
Worked 0 spent 1250 ms
Worked 3 spent 1260 ms
wall time: 1620 ms
Coarse grained:
Worked 1 spent 190 ms
Worked 2 spent 270 ms
Worked 3 spent 410 ms
Worked 0 spent 280 ms
wall time: 1298 ms
My initial thought was that .lock()
and .unlock()
are the problem, but I profiled the implementation and together they consumed only 30% of the time. My second guess is that the fine grained solution has more cache misses, but I doubt it because a single linked list, unlike an array, is inherently prone to cache misses.
Any idea why I don't get the expected parallelization?