Django + FastCGI - randomly raising OperationalErr

2019-02-06 22:48发布

问题:

I'm running a Django application. Had it under Apache + mod_python before, and it was all OK. Switched to Lighttpd + FastCGI. Now I randomly get the following exception (neither the place nor the time where it appears seem to be predictable). Since it's random, and it appears only after switching to FastCGI, I assume it has something to do with some settings.

Found a few results when googleing, but they seem to be related to setting maxrequests=1. However, I use the default, which is 0.

Any ideas where to look for?

PS. I'm using PostgreSQL. Might be related to that as well, since the exception appears when making a database query.

 File "/usr/lib/python2.6/site-packages/django/core/handlers/base.py", line 86, in get_response
   response = callback(request, *callback_args, **callback_kwargs)

 File "/usr/lib/python2.6/site-packages/django/contrib/admin/sites.py", line 140, in root
   if not self.has_permission(request):

 File "/usr/lib/python2.6/site-packages/django/contrib/admin/sites.py", line 99, in has_permission
   return request.user.is_authenticated() and request.user.is_staff

 File "/usr/lib/python2.6/site-packages/django/contrib/auth/middleware.py", line 5, in __get__
   request._cached_user = get_user(request)

 File "/usr/lib/python2.6/site-packages/django/contrib/auth/__init__.py", line 83, in get_user
   user_id = request.session[SESSION_KEY]

 File "/usr/lib/python2.6/site-packages/django/contrib/sessions/backends/base.py", line 46, in __getitem__
   return self._session[key]

 File "/usr/lib/python2.6/site-packages/django/contrib/sessions/backends/base.py", line 172, in _get_session
   self._session_cache = self.load()

 File "/usr/lib/python2.6/site-packages/django/contrib/sessions/backends/db.py", line 16, in load
   expire_date__gt=datetime.datetime.now()

 File "/usr/lib/python2.6/site-packages/django/db/models/manager.py", line 93, in get
   return self.get_query_set().get(*args, **kwargs)

 File "/usr/lib/python2.6/site-packages/django/db/models/query.py", line 304, in get
   num = len(clone)

 File "/usr/lib/python2.6/site-packages/django/db/models/query.py", line 160, in __len__
   self._result_cache = list(self.iterator())

 File "/usr/lib/python2.6/site-packages/django/db/models/query.py", line 275, in iterator
   for row in self.query.results_iter():

 File "/usr/lib/python2.6/site-packages/django/db/models/sql/query.py", line 206, in results_iter
   for rows in self.execute_sql(MULTI):

 File "/usr/lib/python2.6/site-packages/django/db/models/sql/query.py", line 1734, in execute_sql
   cursor.execute(sql, params)

OperationalError: server closed the connection unexpectedly
       This probably means the server terminated abnormally
       before or while processing the request.

回答1:

Possible solution: http://groups.google.com/group/django-users/browse_thread/thread/2c7421cdb9b99e48

Until recently I was curious to test this on Django 1.1.1. Will this exception be thrown again... surprise, there it was again. It took me some time to debug this, helpful hint was that it only shows when (pre)forking. So for those who getting randomly those exceptions, I can say... fix your code :) Ok.. seriously, there are always few ways of doing this, so let me firs explain where is a problem first. If you access database when any of your modules will import as, e.g. reading configuration from database then you will get this error. When your fastcgi-prefork application starts, first it imports all modules, and only after this forks children. If you have established db connection during import all children processes will have an exact copy of that object. This connection is being closed at the end of request phase (request_finished signal). So first child which will be called to process your request, will close this connection. But what will happen to the rest of the child processes? They will believe that they have open and presumably working connection to the db, so any db operation will cause an exception. Why this is not showing in threaded execution model? I suppose because threads are using same object and know when any other thread is closing connection. How to fix this? Best way is to fix your code... but this can be difficult sometimes. Other option, in my opinion quite clean, is to write somewhere in your application small piece of code:

from django.db import connection 
from django.core import signals 
def close_connection(**kwargs): 
    connection.close() 
signals.request_started.connect(close_connection) 

Not ideal thought, connecting twice to the DB is a workaround at best.


Possible solution: using connection pooling (pgpool, pgbouncer), so you have DB connections pooled and stable, and handed fast to your FCGI daemons.

The problem is that this triggers another bug, psycopg2 raising an InterfaceError because it's trying to disconnect twice (pgbouncer already handled this).

Now the culprit is Django signal request_finished triggering connection.close(), and failing loud even if it was already disconnected. I don't think this behavior is desired, as if the request already finished, we don't care about the DB connection anymore. A patch for correcting this should be simple.

The relevant traceback:

 /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/core/handlers/wsgi.py in __call__(self=<django.core.handlers.wsgi.WSGIHandler object at 0x24fb210>, environ={'AUTH_TYPE': 'Basic', 'DOCUMENT_ROOT': '/storage/test', 'GATEWAY_INTERFACE': 'CGI/1.1', 'HTTPS': 'off', 'HTTP_ACCEPT': 'application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate', 'HTTP_AUTHORIZATION': 'Basic dGVzdGU6c3VjZXNzbw==', 'HTTP_CONNECTION': 'keep-alive', 'HTTP_COOKIE': '__utma=175602209.1371964931.1269354495.126938948...none); sessionid=a1990f0d8d32c78a285489586c510e8c', 'HTTP_HOST': 'www.rede-colibri.com', ...}, start_response=<function start_response at 0x24f87d0>)
  246                 response = self.apply_response_fixes(request, response)
  247         finally:
  248             signals.request_finished.send(sender=self.__class__)
  249 
  250         try:
global signals = <module 'django.core.signals' from '/usr/local/l.../Django-1.1.1-py2.6.egg/django/core/signals.pyc'>, signals.request_finished = <django.dispatch.dispatcher.Signal object at 0x1975710>, signals.request_finished.send = <bound method Signal.send of <django.dispatch.dispatcher.Signal object at 0x1975710>>, sender undefined, self = <django.core.handlers.wsgi.WSGIHandler object at 0x24fb210>, self.__class__ = <class 'django.core.handlers.wsgi.WSGIHandler'>
 /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/dispatch/dispatcher.py in send(self=<django.dispatch.dispatcher.Signal object at 0x1975710>, sender=<class 'django.core.handlers.wsgi.WSGIHandler'>, **named={})
  164 
  165         for receiver in self._live_receivers(_make_id(sender)):
  166             response = receiver(signal=self, sender=sender, **named)
  167             responses.append((receiver, response))
  168         return responses
response undefined, receiver = <function close_connection at 0x197b050>, signal undefined, self = <django.dispatch.dispatcher.Signal object at 0x1975710>, sender = <class 'django.core.handlers.wsgi.WSGIHandler'>, named = {}
 /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/db/__init__.py in close_connection(**kwargs={'sender': <class 'django.core.handlers.wsgi.WSGIHandler'>, 'signal': <django.dispatch.dispatcher.Signal object at 0x1975710>})
   63 # when a Django request is finished.
   64 def close_connection(**kwargs):
   65     connection.close()
   66 signals.request_finished.connect(close_connection)
   67 
global connection = <django.db.backends.postgresql_psycopg2.base.DatabaseWrapper object at 0x17b14c8>, connection.close = <bound method DatabaseWrapper.close of <django.d...ycopg2.base.DatabaseWrapper object at 0x17b14c8>>
 /usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/db/backends/__init__.py in close(self=<django.db.backends.postgresql_psycopg2.base.DatabaseWrapper object at 0x17b14c8>)
   74     def close(self):
   75         if self.connection is not None:
   76             self.connection.close()
   77             self.connection = None
   78 
self = <django.db.backends.postgresql_psycopg2.base.DatabaseWrapper object at 0x17b14c8>, self.connection = <connection object at 0x1f80870; dsn: 'dbname=co...st=127.0.0.1 port=6432 user=postgres', closed: 2>, self.connection.close = <built-in method close of psycopg2._psycopg.connection object at 0x1f80870>

Exception handling here could add more leniency:

/usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/db/__init__.py

   63 # when a Django request is finished.
   64 def close_connection(**kwargs):
   65     connection.close()
   66 signals.request_finished.connect(close_connection)

Or it could be handled better on psycopg2, so to not throw fatal errors if all we're trying to do is disconnect and it already is:

/usr/local/lib/python2.6/dist-packages/Django-1.1.1-py2.6.egg/django/db/backends/__init__.py

   74     def close(self):
   75         if self.connection is not None:
   76             self.connection.close()
   77             self.connection = None

Other than that, I'm short on ideas.



回答2:

In the switch, did you change PostgreSQL client/server versions?

I have seen similar problems with php+mysql, and the culprit was an incompatibility between the client/server versions (even though they had the same major version!)



回答3:

Smells like a possible threading problem. Django is not guaranteed thread-safe although the in-file docs seem to indicate that Django/FCGI can be run that way. Try running with prefork and then beat the crap out of the server. If the problem goes away ...



回答4:

Maybe the PYTHONPATH and PATH environment variable is different for both setups (Apache+mod_python and lighttpd + FastCGI).



回答5:

In the end I switched back to Apache + mod_python (I was having other random errors with fcgi, besides this one) and everything is good and stable now.

The question still remains open. In case anybody has this problem in the future and solves it they can record the solution here for future reference. :)



回答6:

I fixed a similar issue when using a geodjango model that was not using the default ORM for one of its functions. When I added a line to manually close the connection the error went away.

http://code.djangoproject.com/ticket/9437

I still see the error randomly (~50% of requests) when doing stuff with user login/sessions however.



回答7:

I went through the same problem recently (lighttpd, fastcgi & postgre). Searched for a solution for days without success, and as a last resort switched to mysql. The problem is gone.



回答8:

Why not storing session in cache? Set

SESSION_ENGINE = "django.contrib.sessions.backends.cache"

Also you can try use postgres with pgbouncer (postgres - prefork server and don't like many connects/disconnects per time), but firstly check your postgresql.log.

Another version - you have many records in session tables and django-admin.py cleanup can help.



回答9:

The problem could be mainly with Imports. Atleast thats what happened to me. I wrote my own solution after finding nothing from the web. Please check my blogpost here: Simple Python Utility to check all Imports in your project

Ofcourse this will only help you to get to the solution of the original issue pretty quickly and not the actual solution for your problem by itself.



回答10:

Change from method=prefork to method=threaded solved the problem for me.



回答11:

I try to give an answer to this even if I'am not using django but pyramid as the framework. I was running into this problem since a long time. Problem was, that it was really difficult to produce this error for tests... Anyway. Finally I solved it by digging through the whole stuff of sessions, scoped sessions, instances of sessions, engines and connections etc. I found this:

http://docs.sqlalchemy.org/en/rel_0_7/core/pooling.html#disconnect-handling-pessimistic

This approach simply adds a listener to the connection pool of the engine. In the listener a static select is queried to the database. If it fails the pool try to establish a new connection to the database before it fails at all. Important: This happens before any other stuff is thrown to the database. So it is possible to pre check connection what prevents the rest of your code from failing.

This is not a clean solution since it don't solve the error itself but it works like a charm. Hope this helps someone.



回答12:

Have you considered downgrading to Python 2.5.x (2.5.4 specifically)? I don't think Django would be considered mature on Python 2.6 since there are some backwards incompatible changes. However, I doubt this will fix your problem.

Also, Django 1.0.2 fixed some nefarious little bugs so make sure you're running that. This very well could fix your problem.