可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I've been having some problems with using BerkeleyDB. I have multiple instances of the same code pointed to a single repository of DB files, and everything runs fine for 5-32 hours, then suddenly there is a deadlock. The command prompts stop right before executing a db_get or db_put or cursor creation call. So I'm simply asking for the proper way to handle these calls. Here's my general layout:

This is how the environment and DBs are created:

my $env = new BerkeleyDB::Env ( 
   -Home   => "$dbFolder\\" , 
   -Flags  => DB_CREATE | DB_INIT_CDB | DB_INIT_MPOOL) 
   or die "cannot open environment: $BerkeleyDB::Error\n";

my $unsortedHash  = BerkeleyDB::Hash->new (
   -Filename => "$dbFolder/Unsorted.db", 
   -Flags => DB_CREATE,
   -Env  => $env
   ) or die "couldn't create: $!, $BerkeleyDB::Error.\n";

A single instance of this code runs, goes to a site and saves URLs to be parsed by another instance (I have the flag set so that every DB is locked when one is locked):

        $lk = $unsortedHash->cds_lock();
        while(@urlsToAdd){
            my $currUrl = shift @urlsToAdd;
            $unsortedHash->db_put($currUrl, '0');
        }
        $lk->cds_unlock();

It periodically checks if a certain number of items are in Unsorted:

$refer = $unsortedHash->db_stat();
$elements = $refer->{'hash_ndata'};

Before adding any element to any DB, it first checks all DBs to see if that element is already present:

if ($unsortedHash->db_get($search, $value) == 0){
    $value = "1:$value";
}elsif ($badHash->db_get($search, $value) == 0){
    $value =  "2:$value";
....

This next code comes after, and many instances of it are run in parallel. First, it gets the next item in unsorted (that does not have the busy value '1'), then sets the value to busy '1', then does something with it, then moves the DB entry completely to another DB (it is removed from unsorted and stored in another DB):

my $pageUrl = '';
my $busy = '1';
my $curs;
my $lk = $unsortedHash->cds_lock(); #lock, change status to 1, unlock
########## GET AN ELEMENT FROM THE UNSORTED HASH #######
while(1){
    $busy = '1';
    $curs = $unsortedHash->db_cursor();
    while ($busy){
        $curs->c_get($pageUrl, $busy, DB_NEXT);
        print "$pageUrl:$busy:\n";
        if ($pageUrl eq ''){
            $busy = 0;
        }
    }
    $curs->c_close();
    $curs = undef;

    if ($pageUrl eq ''){
        print "Database empty. Sleeping...\n";
        $lk->cds_unlock();
        sleep(30);
        $lk = $unsortedHash->cds_lock();
    }else{
        last;
    }
}

####### MAKE THE ELEMENT 'BUSY' AND DOWNLOAD IT 


$unsortedHash->db_put($pageUrl, '1');
$lk->cds_unlock();
$lk = undef;

And in every other place, if I call db_put or db_del on ANY DB, it is wrapped with a lock like so:

print "\n\nBad.\n\n";
        $lk = $badHash->cds_lock();
        $badHash->db_put($pageUrl, '0');
        $unsortedHash->db_del($pageUrl);
        $lk->cds_unlock();
        $lk = undef;

However, my db_get commands are free-floating with no lock, because I don't think reading needs a lock.

I have looked over this code a million times and the algorithm is airtight. So I am just wondering if I am implementing any part of this wrong, using the locks wrong, etc. Or if there is a better way to prevent deadlocking (or even diagnose deadlocking) with BerkeleyDB and Strawberry Perl?

UPDATE: To be more specific, the problem is occurring on a Windows 2003 server (1.5 GB RAM, not sure if that is important). I can run this whole setup fine on my Windows 7 machine (4GB RAM). I also started printing out the lock stats using the following:

Adding this flag to the environment creation:

-MsgFile => "$dbFolder/lockData.txt"

And then calling this every 60 seconds:

my $status = $env->lock_stat_print();
print "Status:$status:\n";

The status is always returned as 0, which is success. Here is the last stat report:

29  Last allocated locker ID
0x7fffffff  Current maximum unused locker ID
5   Number of lock modes
1000    Maximum number of locks possible
1000    Maximum number of lockers possible
1000    Maximum number of lock objects possible
40  Number of lock object partitions
24  Number of current locks
42  Maximum number of locks at any one time
5   Maximum number of locks in any one bucket
0   Maximum number of locks stolen by for an empty partition
0   Maximum number of locks stolen for any one partition
29  Number of current lockers
29  Maximum number of lockers at any one time
6   Number of current lock objects
13  Maximum number of lock objects at any one time
1   Maximum number of lock objects in any one bucket
0   Maximum number of objects stolen by for an empty partition
0   Maximum number of objects stolen for any one partition
3121958 Total number of locks requested
3121926 Total number of locks released
0   Total number of locks upgraded
24  Total number of locks downgraded
9310    Lock requests not available due to conflicts, for which we waited
0   Lock requests not available due to conflicts, for which we did not wait
8   Number of deadlocks
1000000 Lock timeout value
0   Number of locks that have timed out
1000000 Transaction timeout value
0   Number of transactions that have timed out
792KB   The size of the lock region
59  The number of partition locks that required waiting (0%)
46  The maximum number of times any partition lock was waited for (0%)
0   The number of object queue operations that required waiting (0%)
27  The number of locker allocations that required waiting (0%)
0   The number of region locks that required waiting (0%)
1   Maximum hash bucket length

Of which I am wary of this:

8   Number of deadlocks

How did these deadlocks occur, and how were they resolved? (all parts of the code are still running). What exactly is a deadlock, in this case?

回答1:

In short, you need to do deadlock detection. I can see two possibilities to do that. First, you can use the db_deadlock utility. Second, and perhaps more conveniently, you can specify the -LockDetect flag when opening your environment, a flag that's not exactly explained in depth in the Perl docs for BerkeleyDB.pm.

Both ways appear to work fine for me in version 4.5.20. (What's your version, by the way?)

Now for the detail.

Specifying the -LockDetect flag is really just that. There are a couple of values to choose from. I chose DB_LOCK_DEFAULT and it appeared to work just fine. With more clues as to what's going on you could certainly get more fancy.

Running the db_deadlock utility could be done like this:

db_deadlock -h your/env/dir -v -t 3   # run as daemon, check every 3 seconds
db_deadlock -h your/env/dir -v        # run once

Here's a quote from the db_deadlock manual:

This utility should be run as a background daemon, or the underlying Berkeley DB deadlock detection interfaces should be called in some other way, whenever there are multiple threads or processes accessing a database and at least one of them is modifying it.

I arrived at the conclusion that both ways do work fine by repeatedly performing a test with two writers and one reader, which would deadlock a couple times while putting new entries in the database in rapid succession (100 per second), or going through a cursor of all keys in the database.

The flag method appears to deal with deadlocks very quickly, they didn't become noticeable in my tests.

On the other hand, running the db_deadlock utility with verbose output in paralles with the scripts is instructive in that you see how they block and then continue after lockers have been aborted, especially when combined with the db_stat utility:

db_stat -Cl # Locks grouped by lockers
db_stat -Co # Locks grouped by object
db_stat -Cp # need_dd = 1 ?
db_stat -CA # all of the above plus more

I lack the expertise to explain all the details, but you can see that in blocked situations there are certain entries there while in others there aren't. Also see the section entitled Berkeley DB Concurrent Data Store locking conventions(what is IWRITE?) in the Berkeley DB Programmer's Reference Guide.

You're asking how these deadlocks did occur. Can't say exactly, but I do see that they are occurring with concurrent access. You're also asking how they were resolved. I have no idea. In my test scenarios, blocked scripts will simply hang. Maybe in your scenario someone ran deadlock detection without you knowing about it?

For completeness, your application might simply hang because a thread has not closed resources before exiting. Might happen if you just Ctrl-C a process and there is no clean-up handler in place to close resources. But that doesn't appear to be your problem.

If it does become your problem, you should review the section on Handling failure in Data Store and Concurrent Data Store applications in the Reference Guide.

CDS and DS have no concept of recovery. Since CDS and DS don't support transactions and don't maintain a recovery log, they cannot run recovery. If the database gets corrupted in DS or CDS, you can only remove it and recreate it. (Taken moreless verbatim from the Berkeley DB Book by Himanshu Yadava.)

Finally, there are video tutorials on the Oracle site, including one on using CDS by Margo Seltzer.

回答2:

However, my db_get commands are free-floating with no lock, because I don't think reading needs a lock.

This assumption is wrong. As http://pybsddb.sourceforge.net/ref/lock/page.html says, BerkeleyDB has to issue read locks internally because otherwise you could get undefined behavior if a reader tried to read data that was being changed out from under it. Therefore reads can easily be part of in a deadlock situation.

This is particularly true in the presence of cursors. Read cursors maintain locks on everything that has been read until the cursor is closed. See http://pybsddb.sourceforge.net/ref/lock/am_conv.html for more details an ways that you can get into deadlock (in fact you can even deadlock yourself).

回答3:

While not a BerkeleyDB solution, you might be able to use alternative locking though Win32::Mutex, which uses underlying Windows mutexes. A very simple example is below:

#!perl -w
use strict;
use warnings;

use Win32::Mutex; # from Win32::IPC

my $mutex = Win32::Mutex->new(0, 'MyAppBerkeleyLock');

for (1..10) {
    $mutex->wait(10*1000) or die "Failed to lock mutex $!";
    print "$$ has lock\n";
    sleep(rand(7));
    $mutex->release();
}