-->

Python multithreading

2019-07-29 12:55发布

问题:

I have this scenario:

A web page created with Zope/Plone and some mine python API. There's a web page, call it "a", that by a python method calls a database (Postgres) and returns some information. On page "a" you can modify database data "offline" (I intend that the changes aren't written in the database instantly but in a second moment when you press "save" and call a python API method). So, imagine this scenario: an user, called "Sam", loads the page and start to modify data. Meanwhile an user, called "Sara", modifies the database by the page "a" clicking "save". Now Sam doesn't have the actual database data: he'll push "save" and overwrite Sara's data change.

I would have an alert on my page in real time. I thought I can do something like this:

Make an AJAX call, that isn't blockable, and keep going with page render. The AJAX calls a python method that creates a thread that does an infinite loop (on an "X" condition). When I write data on database, I'll call a function that will change "X condition" stopping the thread and returning to AJAX.

Moreover, I can't lock the database because I have to give free access to every user that wants to modify my database.

My problem is: how can I identify a python thread ? I've just saw that every single method on a class that inherit from Thread wants "self" as parameter. Moreover, I have to call the thread as I access the "a" page and this will be somewhere in the code (say on the "threads module") but the inserts are on the other module. So, how can I realize my idea ?

And if someone have an alternative idea, tell me without any problem :)

回答1:

The realm of problem you're discussing is generally called, "Concurrency". Since your method would warn or block the user from updating when any field in the target item changes, the approach is usually called "Pessimistic Concurrency". One way to do this is to keep track of what the item looked like when it was selected, and only update if the database version looks exactly like the version you selected or has not been updated since a certain time (a timestamp field may be helpful). You could also try optimistic concurrency, in which you only check that fields one user has updated and is saving back to the datastore were not updated by the other user. Both of these methods are easiest if you choose an ORM library that supports concurrency.

My favorite python web library is django, and here is a question on SO about the same situation you are looking to solve: Django: How can I protect against concurrent modification of database entries. I hope it helps.

Handling concurrency in the manner you suggest is doable but should be avoided in most situations. I've done it before when adding concurrency to a large system with complex objects that had wide ranging side effects and no unified data access (there were about 5 methods of data access over the lifetime of the system, it was a colorful system). It's bug prone and complex way to handle concurrency (I think I had a client app and kicked off a watcher thread after marking items "checked out" in a data table that described the type and identifier of the object, the user who checked it out, when they checked it out, and how long it was valid for, in case the client who checked the object out failed to check it in when finished).

If you are set on not using an ORM and displaying a message to the user when changes have occurred to the item, try going off a last updated timestamp column and just have your ajax call check to see if the last update time is greater than it was when you first loaded the item. So, if you were coding a generic way to do this, you would simply need the table name, the primary key, and the timestamp.

webservice method might look like:

def is_most_current(table_name, id):
    db = MySQLdb.connect(passwd="moonpie",db="thangs")
    c=db.cursor()
    c.execute("SELECT last_updated from %s where id = %s", (table_name, id))
    return c.fetchone()

As for the python multithreading libraries, python threads are confusing and produce poor performance thanks to issues with python's global interlock, you may actually want to spawn a new process in many cases (the multiprocessing library is fairly equivalent and performs better in parallel processing scenarios). As far as "self" that's a pythonic convention for the reference to the instance of the class you're dealing with, much like "this" in C like languages. You could easily identify a thread by giving it a unique name when you construct it. See the multiprocessing or threading docs for more info. If you can avoid threading for this problem, I recommend that you do so.