What is the recommendation about using multiprocessing (or multithreading) inside a flask endpoint?

Mangs

1人浏览 · 2022-09-04 12:51:00

Mangs · 2022-09-04 12:51:00 发布

Answer a question

I know that an app server can be configured to:

Launch new process per request
Launch new thread per request

This question is regarding using python multi-processing (or multi-threading) code inside a flask endpoint. For example say I want to use python multiprocessing for CPU intensive work (or multithreading for IO intensive work).

I have a flask endpoint that takes 40 seconds to run the code (CPU intensive work). I have used python multiprocessing (pool) inside the endpoint code [so that certain CPU intensive things can be done in parallel via multiple processes] and now the endpoints takes 4 seconds to run.

Is it OK to use python multiprocessing (or multithreading) code inside an endpoint when either of the above 2 app server configurations (that is -when the app server is configured to serve each request in a new thread or each new request in a new process). The thread per request is default setting of the flask development server. Where as for gunicorn I can choose either. Is there anything I need to consider when using multiprocessing (or multithreading) inside a flask endpoint so that I am not messing up with the flask process/thread.

I know that a better solution is to use a task queue. But this question is specifically regarding using multithreading/multiprocessing.

Answers

In short, don't. It's tempting to try to negotiate some way to do a lot of work directly within a request handler, but that path leads to pain.

Consider instead one of the frameworks that allows a request handler (e.g., a Flask route) to queue up a task to be run asynchronously. The handler queues work, and gets back a task id, saving it in some way that allows the UI to poll for task completion. Meanwhile, an entirely separate process outside of flask picks up work, performs it, and returns a response through the framework (or separately via a shared data store).

Celery and Rq are two of such frameworks. (The Flask Mega Tutorial has a chapter on Rq that's worth a read.) They do require some additional setup. At a minimum, you'll need a shared Redis instance.

This approach has several benefits: First, it allows your web app to remain responsive. If your 40 second task evolves into an 80, 160, or several thousand second task, you won't tie up a Flask thread. Second, it protects Flask against memory growth and fragmentation; the task is performed in an entirely separate process, which will release memory when it exits.

What you do in those tasks is isolated from Flask. Want to use multiple processes or thread pools with a task? Fine. There's very little* risk of interfering with Flask. (* You could exhaust memory if you're running tasks workers on the same server as Flask).

Python

Python社区为您提供最前沿的新闻资讯和知识内容

更多推荐

求助！为什么用InsCode部署会出现无限重定向？

Python

如何重塑熊猫。系列

问题:如何重塑熊猫。系列在我看来,它就像 pandas.Series 中的一个错误。 a = pd.Series([1,2,3,4]) b = a.reshape(2,2) b b 有类型 Series 但无法显示,最后一条语句给出异常,非常冗长,最后一行是“TypeError: %d format: a number is required, not numpy.ndarray”。 b.sha

Python

在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制]

问题:在哪里可以找到有关 Keras 中默认权重初始化器的文档? [复制] 我刚刚在这里](https://keras.io/initializers/)中阅读了有关[中的 Keras 权重初始化器的信息。在文档中,只介绍了不同的初始化程序。如: model.add(Dense(64, kernel_initializer='random_normal')) 当我没有指定kernel_initia