I am using urllib2
's build_opener()
to create an OpenerDirector
. I am using the OpenerDirector
to fetch a slow page and so it has a large timeout.
So far, so good.
However, in another thread, I have been told to abort the download - let's say the user has selected to exit the program in the GUI.
Is there a way to signal an urllib2 download should quit?
There is no clean answer. There are several ugly ones.
Initially, I was putting rejected ideas in the question. As it has become clear that there are no right answers, I decided to post the various sub-optimal alternatives as a list answer. Some of these are inspired by comments, thank you.
Library Support
An ideal solution would be if OpenerDirector
offered a cancel operator.
It does not. Library writers take note: if you provide long slow operations, you need to provide a way to cancel them if people are to use them in real-world applications.
Reduce timeout
As a general solution for others, this may work. With a smaller timeout, it would be more responsive to the changes in circumstances. However, it will also cause downloads to fail if they weren't completely finished in the timeout time, so this is a trade-off. In my situation, it is untenable.
Read the download in chunks.
Again, as a general solution, this may work. If the download consists of very large files, you can read them in small chunks, and abort after a chunk is read.
Unfortunately, if (as in my case) the delay is in receiving the first byte, rather than the size of the file, this will not help.
Kill the entire thread.
While there are some aggressive techniques to kill threads, depending on the operating system, they are not recommended. In particular, they can cause deadlocks to occur. See Eli Bendersky's two articles (via @JBernardo).
Just be unresponsive
If the abort operation has been triggered by the user, it may be simplest to just be unresponsive, and not act on the request until the open operation has completed.
Whether this unresponsiveness is acceptable to your users (hint: no!), is up to your project.
It also continues to place a demand on the server, even if the result is known to be unneeded.
Let it peter out in another thread.
If you create a separate thread to run the operation, and then communicate with that thread in an interruptable manner, you could discard the blocked thread, and start working on the next operation instead. Eventually, the thread will unblock and then it can gracefully shut-down.
The thread should be a daemon, so it doesn't block the total shut-down of the application.
This will give the user responsiveness, but it means that the server that will need to continue to support it, even though the result is not needed.
Rewrite the socket methods to be polling-based.
As described in @Luke's answer, it may be possible to provide (fragile?, unportable?) extensions to the standard Python libraries.
His solution changes the socket operations from blocking to polling. Another might allow shutdown through the socket.shutdown()
method (if that, indeed, will interrupt a blocked socket - not tested.)
A solution based on Twisted may be cleaner. See below.
Replace the sockets with asynchronous, non-thread-based libraries.
The Twisted framework provides a replacement set of libraries for network operations that are event-driven. I understand this means that all of the different communications can be handled by a single-thread with no blocking.
Sabotage
It may be possible to navigate the OpenerDirector
, to find the baselevel socket that is blocking, and sabotage it directly (Will socket.shutdown()
be sufficient?) to make it return.
Yuck.
Put it in a separate (killable) process
The thread that reads the socket can be moved into a separate process, and interprocess communication can be used to transmit the result. This IPC can be aborted early by the client, and then the whole process can be killed.
Ask the Web Server to cancel
If you have control over the web-server being read, it could be sent a separate message asking it to close the socket. That should cause the blocked client to react.
I don't see any built-in mechanism to accomplish this. I would just move the OpenerDirector out to its own thread process so it would be safe to kill it.
Note: there is no way to 'kill' a thread in python (thanks JBernardo). It may, however, be possible to generate an exception in the thread, but it's likely this won't work if the thread is blocking on a socket.
Here's a start for another approach. It works by extending part of the httplib stack to include a non-blocking check for the server response. You would have to make a few changes to implement this within your thread. Also note that it uses some undocumented bits of urllib2 and httplib, so the final solution for you will probably depend on the version of Python you are using (I have 2.7.3). Poke around in your urllib2.py and httplib.py files; they're quite readable.
import urllib2, httplib, select, time
class Response(httplib.HTTPResponse):
def _read_status(self):
## Do non-blocking checks for server response until something arrives.
while True:
sel = select.select([self.fp.fileno()], [], [], 0)
if len(sel[0]) > 0:
break
## <--- Right here, check to see whether thread has requested to stop
## Also check to see whether timeout has elapsed
time.sleep(0.1)
return httplib.HTTPResponse._read_status(self)
class Connection(httplib.HTTPConnection):
response_class = Response
class Handler(urllib2.HTTPHandler):
def http_open(self, req):
return self.do_open(Connection, req)
h = Handler()
o = urllib2.build_opener(h)
f = o.open(url)
print f.read()
Also note that there are many places in the stack that could potentially block; this example only covers one of them--the server has received the request but takes a long time to respond.
I find an approach with placing all your urllib-related jobs in threads most appropriate one because of blocking nature of urllib. Then it's possible to abort tasks altogether, including requests. Killing threads is indeed unsafe but exceptions raising should be safe.
So this is how to raise an exception in a thread (doc):
import ctypes
ctypes.pythonapi.PyThreadState_SetAsyncExc(ctypes.c_long(your_thread.ident),
ctypes.py_object(your_exception))
If the socket at the moment would be in a blocking (connecting) state, an exception will be raised immediately after the thread will become alive again.