Monday, 20 August 2007

Handling timeouts with urllib2 in python

On a more technical note than usual, I recently came across a bug that took a very long time to solve. The problem was related to using urllib2 in Python to fetch a url.

Essentially, urllib2 does not have an inbuilt timeout set, so by default your code will stall on a opener.open() call if the server timesout without telling you. This doesn't happen very often, so your code can run for several weeks before failing to respond.

What is more, there appears to be no direct access to set this timeout, however there is a way around it (ref). The trick is to set the default timeout for all sockets before creating a connection. In the same scope as urllib2, and before you begin using the module, enter the following lines:

import socket
socket.setdefaulttimeout(timeout)

(where timeout is the number of seconds you would like to wait for a timeout.)

This should set the timeout for the socket that will be opened by urllib2. Hope this is as useful to someone else as it was to me...

1 comment:

Pranshu Sharma said...

there is another problem here. the timout is only a connect timeout. if the connection is made and then if the server does not respond, then the server will not timeout (like a normal browser does). (ref)