I've been having some problems with using BerkeleyDB. I have multiple instances of the same code pointed to a single repository of DB files, and everything runs fine for 5-32 hours, then suddenly there is a deadlock. The command prompts stop right before executing a db_get or db_put or cursor creation call. So I'm simply asking for the proper way to handle these calls. Here's my general layout:
This is how the environment and DBs are created:
my $env = new BerkeleyDB::Env (
-Home => "$dbFolder\\" ,
-Flags => DB_CREATE | DB_INIT_CDB | DB_INIT_MPOOL)
or die "cannot open environment: $BerkeleyDB::Error\n";
my $unsortedHash = BerkeleyDB::Hash->new (
-Filename => "$dbFolder/Unsorted.db",
-Flags => DB_CREATE,
-Env => $env
) or die "couldn't create: $!, $BerkeleyDB::Error.\n";
A single instance of this code runs, goes to a site and saves URLs to be parsed by another instance (I have the flag set so that every DB is locked when one is locked):
$lk = $unsortedHash->cds_lock();
while(@urlsToAdd){
my $currUrl = shift @urlsToAdd;
$unsortedHash->db_put($currUrl, '0');
}
$lk->cds_unlock();
It periodically checks if a certain number of items are in Unsorted:
$refer = $unsortedHash->db_stat();
$elements = $refer->{'hash_ndata'};
Before adding any element to any DB, it first checks all DBs to see if that element is already present:
if ($unsortedHash->db_get($search, $value) == 0){
$value = "1:$value";
}elsif ($badHash->db_get($search, $value) == 0){
$value = "2:$value";
....
This next code comes after, and many instances of it are run in parallel. First, it gets the next item in unsorted (that does not have the busy value '1'), then sets the value to busy '1', then does something with it, then moves the DB entry completely to another DB (it is removed from unsorted and stored in another DB):
my $pageUrl = '';
my $busy = '1';
my $curs;
my $lk = $unsortedHash->cds_lock(); #lock, change status to 1, unlock
########## GET AN ELEMENT FROM THE UNSORTED HASH #######
while(1){
$busy = '1';
$curs = $unsortedHash->db_cursor();
while ($busy){
$curs->c_get($pageUrl, $busy, DB_NEXT);
print "$pageUrl:$busy:\n";
if ($pageUrl eq ''){
$busy = 0;
}
}
$curs->c_close();
$curs = undef;
if ($pageUrl eq ''){
print "Database empty. Sleeping...\n";
$lk->cds_unlock();
sleep(30);
$lk = $unsortedHash->cds_lock();
}else{
last;
}
}
####### MAKE THE ELEMENT 'BUSY' AND DOWNLOAD IT
$unsortedHash->db_put($pageUrl, '1');
$lk->cds_unlock();
$lk = undef;
And in every other place, if I call db_put or db_del on ANY DB, it is wrapped with a lock like so:
print "\n\nBad.\n\n";
$lk = $badHash->cds_lock();
$badHash->db_put($pageUrl, '0');
$unsortedHash->db_del($pageUrl);
$lk->cds_unlock();
$lk = undef;
However, my db_get commands are free-floating with no lock, because I don't think reading needs a lock.
I have looked over this code a million times and the algorithm is airtight. So I am just wondering if I am implementing any part of this wrong, using the locks wrong, etc. Or if there is a better way to prevent deadlocking (or even diagnose deadlocking) with BerkeleyDB and Strawberry Perl?
UPDATE: To be more specific, the problem is occurring on a Windows 2003 server (1.5 GB RAM, not sure if that is important). I can run this whole setup fine on my Windows 7 machine (4GB RAM). I also started printing out the lock stats using the following:
Adding this flag to the environment creation:
-MsgFile => "$dbFolder/lockData.txt"
And then calling this every 60 seconds:
my $status = $env->lock_stat_print();
print "Status:$status:\n";
The status is always returned as 0, which is success. Here is the last stat report:
29 Last allocated locker ID
0x7fffffff Current maximum unused locker ID
5 Number of lock modes
1000 Maximum number of locks possible
1000 Maximum number of lockers possible
1000 Maximum number of lock objects possible
40 Number of lock object partitions
24 Number of current locks
42 Maximum number of locks at any one time
5 Maximum number of locks in any one bucket
0 Maximum number of locks stolen by for an empty partition
0 Maximum number of locks stolen for any one partition
29 Number of current lockers
29 Maximum number of lockers at any one time
6 Number of current lock objects
13 Maximum number of lock objects at any one time
1 Maximum number of lock objects in any one bucket
0 Maximum number of objects stolen by for an empty partition
0 Maximum number of objects stolen for any one partition
3121958 Total number of locks requested
3121926 Total number of locks released
0 Total number of locks upgraded
24 Total number of locks downgraded
9310 Lock requests not available due to conflicts, for which we waited
0 Lock requests not available due to conflicts, for which we did not wait
8 Number of deadlocks
1000000 Lock timeout value
0 Number of locks that have timed out
1000000 Transaction timeout value
0 Number of transactions that have timed out
792KB The size of the lock region
59 The number of partition locks that required waiting (0%)
46 The maximum number of times any partition lock was waited for (0%)
0 The number of object queue operations that required waiting (0%)
27 The number of locker allocations that required waiting (0%)
0 The number of region locks that required waiting (0%)
1 Maximum hash bucket length
Of which I am wary of this:
8 Number of deadlocks
How did these deadlocks occur, and how were they resolved? (all parts of the code are still running). What exactly is a deadlock, in this case?