Part 2: Deploy Flask API in production using WSGI gunicorn with nginx reverse proxy

Upasana | August 31, 2019 | 7 min read | 3,644 views | Flask - Python micro web framework

This concise tutorial will walk you through Flask REST API from development to production. This is continuation of part 1 article where we discussed creating Flask REST API.
All articles in this Series

What will you learn ?

  1. Production readiness & deployment - using wsgi + gunicorn

    1. Why WSGI Gunicorn

    2. Creating systemd service for gunicorn

  2. Nginx setup as the reverse proxy

    1. Why use nginx?

    2. Configuring nginx

    3. Configuring load balancing using nginx

  3. HTTP Benchmarking using h2load

Production Deployment readiness & deployment - using wsgi + gunicorn

Why use WSGI Gunicorn ?

When we run a Flask app using inbuilt development server, we get the below warning on console:

flask production warning

Fig. development server warning

Flask’s official documentation suggests not to use inbuilt flask server in production deployment.

While lightweight and easy to use, Flask’s built-in server is not suitable for production as it doesn’t scale well and by default serves only one request at a time.

Flask uses WSGI Werkzeug’s development server, which could have following issues:

  1. Inbuilt development server doesn’t scale well.

  2. If you leave debug mode on and an error pops up, it opens up a shell that allows for arbitrary code to be executed on your server.

  3. It will not handle more than one request at a time by default.

The right way to run a Flask app in production is to use wsgi production server, like gunicorn

What is WSGI?

WSGI is not a server, a python module, or a framework. Rather it is just an interface specification by which server and application communicate. Both server and application interface sides are described in details by PEP-3333

A WSGI compliant server only receives the request from the client, pass it to the application and then send the response returned by the application to the client. That’s all, it does nothing else.

What is gunicorn

Gunicorn is a WSGI compatible production ready application server.

Gunicorn 'Green Unicorn' is a Python WSGI HTTP Server for UNIX. It’s a pre-fork worker model. The Gunicorn server is broadly compatible with various web frameworks, simply implemented, light on server resources, and fairly speedy.

gunicorn setup

Install gunicorn using the below command:

$ pip install gunicorn

Now lets create a separate endpoint for wsgi app for our given tutorial

from src.main import app

if __name__ == "__main__":

Run the gunicorn from command line for testing:

$ gunicorn -w 4 -b src.wsgi:app

This will spawn 4 worker threads for the gunicorn server. We can see it in logs:

Program Output
[2019-04-03 09:02:55 +0530] [11333] [INFO] Starting gunicorn 19.9.0
[2019-04-03 09:02:55 +0530] [11333] [INFO] Listening at: (11333)
[2019-04-03 09:02:55 +0530] [11333] [INFO] Using worker: sync
[2019-04-03 09:02:55 +0530] [11338] [INFO] Booting worker with pid: 11338
[2019-04-03 09:02:55 +0530] [11339] [INFO] Booting worker with pid: 11339
[2019-04-03 09:02:55 +0530] [11342] [INFO] Booting worker with pid: 11342
[2019-04-03 09:02:55 +0530] [11343] [INFO] Booting worker with pid: 11343

Systemd Service

We can create systemd service for our flask application. This service will allow automatic start of our app server upon system reboot.

We need to create a unit file with extension .service within the /etc/systemd/system directory:

$ sudo vi /etc/systemd/system/wsgi-app.service

Here is the content for this service unit file:

Description=Gunicorn instance to serve flask app

ExecStart=/home/ubuntu/wsgi-app/venv/bin/gunicorn --workers 3 --bind unix:wsgi-app.sock -m 007 src.wsgi:app


Now our systemd service file is complete and we can save and close it.

Reload systemd daemon to reflect changes
$ sudo systemctl daemon-reload

This command will reload systemd manager configuration, reloads all unit files and recreate the entire dependency tree.

Start the service
$ sudo systemctl start wsgi-app
Stop the service
$ sudo systemctl enable wsgi-app
Check status of service
$ sudo systemctl status wsgi-app
Enable service to start at system reboot
$ sudo systemctl enable wsgi-app

Configure Ubuntu Firewall

We need to open port 5000 in Ubuntu to allow traffic from outside world.

$ sudo ufw allow 5000

Nginx Setup & Configuration

What is nginx

nginx is a front facing web server that most commonly acts as a reverse proxy for an application server.

Any non-trivial production setup for flask may look like this:

typical flask setup

Fig. Nginx as front facing web server

Why use nginx on top of wsgi gunicorn?

regardless of app server in use (gunicorn/mod_wsgi, etc), any production deployment will have something like nginx upstream configured as reverse proxy due to various reasons, for example:

  1. nginx can handle requests upstream that gunicorn should not be handling like serving static files (css assets/js bundles/images). thus, only the dynamic requests can be passed on to gunicorn application server. More importantly, nginx can easily cache these static files and boost the performance.

  2. nginx can act as a load balancer that can evenly route requests across multiple instances of gunicorn in round robin fashion.

  3. It is easy to configure nginx for request throttling, API rate limiting, and blocking potentially unwanted calls from insecure origins.

  4. gunicorn does not need to worry about the clients that are slow, since nginx will take care of that complex part, thus making gunicorn processing model embarrassingly simple. Slow clients can potentially make your application simply stop handling new requests.

  5. SSL/TLS & HTTP/2 can be configured at nginx level, considering that nginx is the only front facing web server that is exposed to internet. As wsgi unicorn is never exposed to internet, all internal communication can happen over plain HTTP/1, without thinking about security. Additionally, nginx can optimize SSL/TLS by session caching, session tickets, etc.

  6. GZIP compression can be handled at nginx level, which will reduce network bandwidth requirements for clients.


Installation of nginx in linux is easy, you just need to have root privileges on the VPS, and run the below command:

$ sudo apt-get install nginx

On MacOS, it can be installed using homebrew

Installing nginx on MacOS
$ brew install nginx
Stop server on MacOS
$ sudo nginx -s stop
Start nginx
$ sudo nginx
Configure nginx on macOS
$ vim /usr/local/etc/nginx/nginx.conf


Configuring nginx as the reverse proxy in-front of our wsgi deployment is very simple. We just need to add the below configuration to our nginx server.

server {
    listen 80;
    listen [::]:80;

    access_log  /var/log/nginx/wsgi-app.access.log;
    error_log   /var/log/nginx/wsgi-app.error.log;

    location / {
        proxy_pass;   (1)
        proxy_redirect   off;
        proxy_set_header Host $http_host;   (2)
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
1 5000 is the port on which wsgi gunicorn server is running.
2 We need to configure proxy server to pass these headers, specially $http_host and $remote_addr, to make WSGI server work properly behind the reverse proxy.

Restart the nginx web server:

Test and restart the service
$ sudo nginx -t
$ sudo service nginx restart

Using nginx as load balancer for multiple wsgi gunicorn instances

nginx wsgi setup

Fig. Nginx as load balancer

$ sudo nano /etc/nginx/sites-available/default
upstream backend  {
server {
  location / {
    proxy_pass  http://backend;

Test and restart the nginx server

$ sudo nginx -t
$ sudo service nginx restart

HTTP Server Benchmarking using h2load

h2load is a modern HTTP benchmarking tool often used for benchmarking server capabilities for the given deployment. We can use h2load to send 10000 requests using 10 concurrent connections on our recently deployed server.

$ h2load -n10000 -c10 -m1 --h1 http://localhost:5000/health.json

Typical results on my machine looks like below:

h2load results
finished in 3.08s, 3246.23 req/s, 155.21KB/s
requests: 10000 total, 10000 started, 10000 done, 10000 succeeded, 0 failed, 0 errored, 0 timeout
status codes: 10000 2xx, 0 3xx, 0 4xx, 0 5xx
traffic: 478.12KB (489592) total, 1.08MB (1130000) headers (space savings 0.00%), 156.25KB (160000) data
                     min         max         mean         sd        +/- sd
time for request:      794us     78.51ms      2.94ms      3.12ms    99.63%
time for connect:       78us       170us       116us        26us    70.00%
time to 1st byte:      898us      2.43ms      1.69ms       628us    50.00%
req/s           :     324.64      326.36      325.31        0.63    70.00%

Top articles in this category:
  1. Flask Interview Questions
  2. Deploying Keras Model in Production with TensorFlow 2.0
  3. Deploying Keras Model in Production using Flask
  4. Part 3: Dockerize Flask application and build CI/CD pipeline in Jenkins
  5. Configure Logging in gunicorn based application in docker container
  6. Part 1: Creating and testing Flask REST API
  7. Named Entity Recognition using spaCy and Flask

Recommended books for interview preparation:

Find more on this topic:
Buy interview books

Java & Microservices interview refresher for experienced developers.