Answer a question

After 2 days of debug, I nailed down my time-hog: the Python garbage collector.
My application holds a lot of objects in memory. And it works well.
The GC does the usual rounds (I have not played with the default thresholds of (700, 10, 10)).
Once in a while, in the middle of an important transaction, the 2nd generation sweep kicks in and reviews my ~1.5M generation 2 objects.
This takes 2 seconds! The nominal transaction takes less than 0.1 seconds.

My question is what should I do?
I can turn off generation 2 sweeps (by setting a very high threshold - is this the right way?) and the GC is obedient.
When should I turn them on?
We implemented a web service using Django, and each user request takes about 0.1 seconds.
Optimally, I will run these GC gen 2 cycles between user API requests. But how do I do that?
My view ends with return HttpResponse(), AFTER which I would like to run a gen 2 GC sweep.
How do I do that? Does this approach even make sense?

Can I mark the object that NEVER need to be garbage collected so the GC will not test them every 2nd gen cycle?
How can I configure the GC to run full sweeps when the Django server is relatively idle?

Python 2.6.6 on multiple platforms (Windows / Linux).

Answers

We did something like this for gunicorn. Depending on what wsgi server you use, you need to find the right hooks for AFTER the response, not before. Django has a request_finished signal but that signal is still pre response.

For gunicorn, in the config you need to define 2 methods like so:

def pre_request(worker, req):
    # disable gc until end of request
    gc.disable()


def post_request(worker, req, environ, resp):
    # enable gc after a request
    gc.enable()

The post_request here runs after the http response has been delivered, and so is a very good time for garbage collection.

Logo

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐