Start background process/daemon from CGI script

2019-01-18 09:44发布

I'm trying to launch a background process from a CGI scripts. Basically, when a form is submitted the CGI script will indicate to the user that his or her request is being processed, while the background script does the actual processing (because the processing tends to take a long time.) The problem I'm facing is that Apache won't send the output of the parent CGI script to the browser until the child script terminates.

I've been told by a colleague that what I want to do is impossible because there is no way to prevent Apache from waiting for the entire process tree of a CGI script to die. However, I've also seen numerous references around the web to a "double fork" trick which is supposed to do the job. The trick is described succinctly in this Stack Overflow answer, but I've seen similar code elsewhere.

Here's a short script I wrote to test the double-fork trick in Python:

import os
import sys

if os.fork():
    print 'Content-type: text/html\n\n Done'
    sys.exit(0)

if os.fork():
    os.setsid()
    sys.exit(0)

# Second child
os.chdir("/")
sys.stdout.close()
sys.stderr.close()
sys.stdin.close()

f = open('/tmp/lol.txt', 'w')

while 1:
     f.write('test\n')

If I run this from the shell, it does exactly what I'd expect: the original script and first descendant die, and the second descendant keeps running until it's killed manually. But if I access it through CGI, the page won't load until I kill the second descendant or Apache kills it because of the CGI timeout. I've also tried replacing the second sys.exit(0) with os._exit(0), but there is no difference.

What am I doing wrong?

10条回答
小情绪 Triste *
2楼-- · 2019-01-18 10:24

Ok, I'm adding a simpler solution, if you don't need to start another script but continue in the same one to do the long process in background. This will let you give a waiting message instantly seen by the client and continue your server processing even if the client kill the browser session:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import os
import sys
import time
import datetime

print "Content-Type: text/html;charset=ISO-8859-1\n\n"
print "<html>Please wait...<html>\n"
sys.stdout.flush()
os.close(sys.stdout.fileno()) # Break web pipe
if os.fork(): # Get out parent process
   sys.exit()

# Continue with new child process
time.sleep(1)  # Be sure the parent process reach exit command.
os.setsid() # Become process group leader

# From here I cannot print to Webserver.
# But I can write in other files or do any long process.
f=open('long_process.log', 'a+')
f.write( "Starting {0} ...\n".format(datetime.datetime.now()) )
f.flush()
time.sleep(15)
f.write( "Still working {0} ...\n".format(datetime.datetime.now()) )
f.flush()
time.sleep(300)
f.write( "Still alive - Apache didn't scalped me!\n" )
f.flush()
time.sleep(150)
f.write( "Finishing {0} ...\n".format(datetime.datetime.now()) )
f.flush()
f.close()

I have read half the Internet for one week without success on this one, finally I tried to test if there is a difference between sys.stdout.close() and os.close(sys.stdout.fileno()) and there is an huge one: The first didn't do anything while the second closed the pipe from the web server and completly disconnected from the client. The fork is only necessary because the webserver will kill its processes after a while and your long process probably needs more time to complete.

查看更多
Emotional °昔
3楼-- · 2019-01-18 10:27

For thous that have "sh: 1: Syntax error: redirection unexpected" with the at/batch solution try using something like this:

Make sure that the at command is installed and the user running the application ins't in /etc/at.deny

os.system("echo sudo /srv/scripts/myapp.py | /usr/bin/at now")
查看更多
Viruses.
4楼-- · 2019-01-18 10:29

I think there are two issues: setsid is in the wrong place and doing buffered IO operations in one of the transient children:

if os.fork():
  print "success"
  sys.exit(0)

if os.fork():
  os.setsid()
  sys.exit()

You've got the original process (grandparent, prints "success"), the middle parent, and the grandchild ("lol.txt").

The os.setsid() call is being performed in the middle parent after the grandchild has been spawned. The middle parent can't influence the grandchild's session after the grandchild has been created. Try this:

print "success"
sys.stdout.flush()
if os.fork():
    sys.exit(0)
os.setsid()
if os.fork():
    sys.exit(0)

This creates a new session before spawning the grandchild. Then the middle parent dies, leaving the session without a process group leader, ensuring that any calls to open a terminal will fail, making sure there's never any blocking on terminal input or output, or sending unexpected signals to the child.

Note that I've also moved the success to the grandparent; there's no guarantee of which child will run first after calling fork(2), and you run the risk that the child would be spawned, and potentially try to write output to standard out or standard error, before the middle parent could have had a chance to write success to the remote client.

In this case, the streams are closed quickly, but still, mixing standard IO streams among multiple processes is bound to give difficulty: keep it all in one process, if you can.

Edit I've found a strange behavior I can't explain:

#!/usr/bin/python

import os
import sys
import time

print "Content-type: text/plain\r\n\r\npid: " + str(os.getpid()) + "\nppid: " + str(os.getppid())
sys.stdout.flush()

if os.fork():
    print "\nfirst fork pid: " + str(os.getpid()) + "\nppid: " + str(os.getppid())
    sys.exit(0)

os.setsid()

print "\nafter setsid pid: " + str(os.getpid()) + "\nppid: " + str(os.getppid())

sys.stdout.flush()

if os.fork():
    print "\nsecond fork pid: " + str(os.getpid()) + "\nppid: " + str(os.getppid())
    sys.exit(0)

#os.sleep(1) # comment me out, uncomment me, notice following line appear and dissapear
print "\nafter second fork pid: " + str(os.getpid()) + "\nppid: " + str(os.getppid())

The last line, after second fork pid, only appears when the os.sleep(1) call is commented out. When the call is left in place, the last line never appears in the browser. (But otherwise all the content is printed to the browser.)

查看更多
再贱就再见
5楼-- · 2019-01-18 10:32

Don't fork - run batch separately

This double-forking approach is some kind of hack, which to me is indication it shouldn't be done :). For CGI anyway. Under the general principle that if something is too hard to accomplish, you are probably approaching it the wrong way.

Luckily you give the background info on what you need - a CGI call to initiate some processing that happens independently and to return back to the caller. Well sure - there are unix commands that do just that - schedule command to run at specific time (at) or whenever CPU is free (batch). So do this instead:

import os

os.system("batch <<< '/home/some_user/do_the_due.py'")
# or if you don't want to wait for system idle, 
#   os.system("at now <<< '/home/some_user/do_the_due.py'")

print 'Content-type: text/html\n'
print 'Done!'

And there you have it. Keep in mind that if there is some output to stdout/stderr, that will be mailed to the user (which is good for debugging but otherwise script probably should keep quiet).

PS. i just remembered that Windows also has version of at, so with minor modification of the invocation you can have that work under apache on windows too (vs fork trick that won't work on windows).

PPS. make sure the process running CGI is not excluded in /etc/at.deny from scheduling batch jobs

查看更多
仙女界的扛把子
6楼-- · 2019-01-18 10:33

As other answers have noted, it is tricky to start a persistent process from your CGI script because the process must cleanly dissociate itself from the CGI program. I have found that a great general-purpose program for this is daemon. It takes care of the messy details involving open file handles, process groups, root directory, etc etc for you. So the pattern of such a CGI program is:

#!/bin/sh
foo-service-ping || daemon --restart foo-service

# ... followed below by some CGI handler that uses the "foo" service

The original post describes the case where you want your CGI program to return quickly, while spawning off a background process to finish handling that one request. But there is also the case where your web application depends on a running service which must be kept alive. (Other people have talked about using beanstalkd to handle jobs. But how do you ensure that beanstalkd itself is alive?) One way to do this is to restart the service (if it's down) from within the CGI script. This approach makes sense in an environment where you have limited control over the server and can't rely on things like cron or an init.d mechanism.

查看更多
Juvenile、少年°
7楼-- · 2019-01-18 10:38

I needed to break the stdout as well as the stderr like this:

sys.stdout.flush()
os.close(sys.stdout.fileno()) # Break web pipe
sys.sterr.flush()
os.close(sys.stderr.fileno()) # Break web pipe
if os.fork(): # Get out parent process
   sys.exit()
#background processing follows here
查看更多
登录 后发表回答