How to Develop and Deploy a Python Application with Docker and Kubernetes

Running Python Applications Directly on Web Servers

In the past, I’ve created a number of Python applications and configured/deployed them as web applications using WSGI and the Apache webserver. I did this for engineering calculators on my website, the Is Texas On Fire? website/map, calculators that used compiled Fortran fire models (CFAST and FDS), and many other tools/experiments.

Traditionally, I would version control my code on GitHub, clone the latest version on the web server machines, obtain the appropriate system/Python libraries, configure and restart Apache, and things ran fairly smoothly. For this reason, I gravitated towards flexible webhosts such as NearlyFreeSpeech.net and WebFaction that provided a compilation toolchain and allowed you to freely run executables via WSGI without needing to open a support ticket for each action.

This method of deploying Python apps made sense to me after years of running websites and applications on tools such as WordPress, which used server-side execution, shared-hosting machines, and system libraries at runtime. And it worked, so I stuck with it for a long time.

Over time, I experimented for both work and play using various deployment tools/platforms such as Heroku, AWS, Salt, Ansible, Google App Engine, and many more. They all worked farily well, but I stuck with WSGI for most of my “production” applications.

Exploring a Container-Based Approach of Deployment

In the past few years, I’ve worked a lot with various Python/R applications, dashboards, notebooks, models, REST APIs, and other types of data science assets that you’d want to deploy. During the same time, I’ve used Docker, docker-compose, Kubernetes, and the many tools in the container-based ecosystem that work well to create reproducible applications and automate the test, build, deployment, maintenance and many other steps to get your code up, running, and deployed to production.

It’s been fascinating to watch the evolution of schedulers, containers, dependency management/packaging, deployment mechanisms, and overall DevOps tooling over the past 10-15 years. I went from configuring custom scheduler rules in PBS/Torque/SLURM/Maui on HPC clusters to managing masters/minions in Salt across cloud-based clusters to configuring load balancers in cloud-based container orchestration in Kubernetes in a 10-year time span. All the while appreciating the effort and intentional design of each iteration of DevOps tooling along the way.

So, I decided it’s finally time to catch up on my personal/side projects and move from a WSGI-based approach to a containerized approach to be able to develop more applications quickly, worry less about uptime and fault tolerance via manual intervention, and have the freedom to quickly create or update an application with low friction.

This blog post is a brief walkthrough of how I created a web application with Python, containerized it with Docker, and deployed it to Kubernetes.

Part 1 – Developing a Web Application in Python

This part is pretty familiar: formulate and solve a problem by writing some backend Python code and a simple frontend.

In this case, I wanted a web application that would quickly tell me how much rainfall has occurred in my home area in Central Texas, especially since rainfall in the summer of 2018 has been pretty scarce (up until two weeks ago, at least).

First, I needed to find a localized data source for historical rainfall. There are a number of spatial- and time-averaged data sets from commercial weather sites. However, I came across real-time data from the LCRA Hydromet that includes streamflow, lake levels, rainfall amounts, temperature, and relative humidity based on hundreds of sensors. The Hydromet site even provides a nice interactive map that shows this data.

However, I wanted a minimal application that showed me data most relevant to my immediate area without having to filter and zoom on the map each time.

After locating the tabular and CSV versions of the relevant weather data on the LCRA Hydromet media page, I created a script to obtain the latest data and parse it using pandas:

#!/usr/bin/env python

import pandas as pd
from collections import OrderedDict

def fetch_data():
    rainfall_one_day = pd.read_csv('http://hydromet.lcra.org/media/Rainfall.csv')
    rainfall_one_day.columns = rainfall_one_day.columns.str.strip().str.lower().str.replace(' ', '_').str.replace('(', '').str.replace(')', '')
    rainfall_one_day.set_index('site', inplace=True)

    names = ['site', 'location', 'basin', 'today', 'last24', '1_day_ago',
             '2_days_ago', '3_days_ago', '4_days_ago', '5_day_total',
             'report_date']
    rainfall_five_day = pd.read_csv('http://hydromet.lcra.org/media/Rain5Day.csv',
                                     names=names, skiprows=1)
    rainfall_five_day.set_index('site', inplace=True)

    rainfall = rainfall_one_day.join(rainfall_five_day, rsuffix='2')
    stations = pd.Series(rainfall.location.values, index=rainfall.index).to_dict(into=OrderedDict)
    return rainfall, stations

Then I created another script that serves a frontend using Flask templated with Bootstrap, and the script accepts POST requests with inputs for the sensor site location and duration of rainfall desired:

#!/usr/bin/env python


from flask import Flask, jsonify, request, render_template
from rainfall import get_stations, rainfall_total


app = Flask(__name__)




@app.route('/', defaults={'path': ''})
@app.route("/<path:path>")
def rainfall(path):
    error = None
    stations = get_stations()
    selected_station = 5619
    duration = '24_hour'
    return render_template('./index.html',
                           duration=duration,
                           stations=stations,
                           selected_station=selected_station,
                           error=error)




@app.route('/', defaults={'path': ''})
@app.route("/<path:path>", methods=['POST'])
def rainfall_post(path):
    site_id = int(request.form['site_id'])
    duration = request.form['duration']
    rainfall_amount, site_id, site_location, duration_natural, stations, error = rainfall_total(site_id=site_id, duration=duration)
    selected_station = site_id
    return render_template('index.html',
                           rainfall_amount=rainfall_amount,
                           duration_natural=duration_natural,
                           site_id=site_id,
                           site_location=site_location,
                           duration=duration,
                           stations=stations,
                           selected_station=selected_station,
                           error=error);




if __name__ == '__main__':
    app.run(host='0.0.0.0', debug=True)

The result is this rainfall-totals Python application:

https://github.com/koverholt/rainfall-totals

Now, we can move on to containerizing our Python application in a Docker image.

Part 2 – Containerizing the Python application in a Docker image

For this step, we can start with a base image (Debian 7, in this case), install Python dependencies (pandas and flask), copy in our Python application source code, expose the web server (port 5000), then start the Flask application.

from debian:7

RUN apt-get update && apt-get install -y curl bzip2

RUN curl -L -o /tmp/anaconda.sh https://repo.anaconda.com/miniconda/Miniconda3-4.5.4-Linux-x86_64.sh
RUN bash /tmp/anaconda.sh -p /opt/anaconda -b
RUN rm /tmp/anaconda.sh
ENV PATH=/opt/anaconda/bin:${PATH}
RUN /opt/anaconda/bin/conda install -y pandas=0.23.4 flask=1.0.2 nomkl

copy . /
EXPOSE 5000
CMD /opt/anaconda/bin/python /app.py

We can build and run the Docker container using:

docker build -t rainfall-app:1.0 .
docker run -d -p 5000:5000 rainfall-app:1.0

And access the application in our browser:

Nice! We have the application containerized and running locally on our machine. Time to deploy it out into the world on a Kubernetes cluster.

Part 3 – Deploying the Application to Kubernetes

I created a Kubernetes cluster on Google Kubernetes Engine (GKE), then I created some Kubernetes resources in a different Git repository, specifically to decouple the rainfall totals application and the deployment infrastructure. In the future I can release versions of the rainfall application or create additional applications and separately update the Kubernetes resources to deploy everything with a single command (and automate this eventually!).

First, I used the Google Cloud Shell to clone the rainfall totals application repository, build the Docker image, and push it to the Google Cloud Registry for later use.

Then, I created a Kubernetes deployment to run three replicas of the rainfall totals application:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: rainfall-app
spec:
  selector:
    matchLabels:
      run: rainfall-app
  replicas: 3
  template:
    metadata:
      labels:
        run: rainfall-app
    spec:
      containers:
      - name: rainfall-app
        image: gcr.io/koverholt-apps/rainfall:1.4
        ports:
        - containerPort: 5000

And a Kubernetes Service to expose the service on a NodePort:

apiVersion: v1
kind: Service
metadata:
  name: rainfall-service
  labels:
    run: rainfall-service
spec:
  ports:
  - port: 5000
    targetPort: 5000
    protocol: TCP
  type: NodePort
  selector:
    run: rainfall-app

Finally, I created a Kubernetes Ingress resource to route various paths to specific services/applications. This will be useful for deploying additional applications in the future:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: ingress
  annotations:
    kubernetes.io/ingress.global-static-ip-name: "koverholt-apps-ip"
spec:
  rules:
  - http:
      paths:
      - path: /*
        backend:
          serviceName: default-service
          servicePort: 8080
      - path: /rainfall/*
        backend:
          serviceName: rainfall-service
          servicePort: 5000

All of the Kubernetes resources live in the following GitHub repository:

https://github.com/koverholt/koverholt-apps

With all of these resources created, I can deploy the application, service, and ingress/load balancer to the Kubernetes cluster on GKE, either from my local machine, or using Google Cloud Shell:

kubectl apply -f .

I can view all of the Kubernetes resources on my cluster to confirm that things are up and running:

$ kubectl get all
NAME                  DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deploy/default-app    2         2         2            2           13d
deploy/rainfall-app   3         3         3            3           13d

NAME                         DESIRED   CURRENT   READY     AGE
rs/default-app-65fc9f58      2         2         2         13d
rs/rainfall-app-dffbff4f8    3         3         3         12d

NAME                              READY     STATUS    RESTARTS   AGE
po/default-app-65fc9f58-6rlcl     1/1       Running   0          1d
po/default-app-65fc9f58-tmmk5     1/1       Running   0          23h
po/rainfall-app-dffbff4f8-j6cdg   1/1       Running   0          23h
po/rainfall-app-dffbff4f8-tj6mf   1/1       Running   0          23h
po/rainfall-app-dffbff4f8-z5grf   1/1       Running   0          23h

NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)          AGE
svc/default-service    NodePort    10.43.243.28    <none>        8080:30289/TCP   13d
svc/kubernetes         ClusterIP   10.43.240.1     <none>        443/TCP          13d
svc/rainfall-service   NodePort    10.43.255.248   <none>        5000:30210/TCP   13d

Looks good! The deployment, replicaset, service, and three replicate pods running for the rainfall application, a default application to serve on the bare route, and the load balancer are all up and running.

And finally, I configured a static IP address for the load balancer and a DNS A record for apps.koverholt.com to point to that static IP address.

That’s it!

After the nodes pull the rainfall totals application and the load balancer is up and running, I can view the application by pointing my browser to http://apps.koverholt.com/rainfall/:

Summary

The purpose of this post was to discuss the traditional method I used to deploy Python web applications and walk through the new container-based method of deployment. There are a lot of details around the machinery used in Docker and Kubernetes along the way, but the above steps provide a minimal working example of how this can be achieved. There are also a lot of steps that can be automated such as the image build and deployment, which I’m sure that I will address as I release more versions and additional applications.

Now, I don’t need to worry about specific machines going down, maintaining multiple WSGI environments, or synchronizing Git commits to the latest production version of my application. Reproducible Docker images, load balancers, and replicas ensure that my application will stay alive as I intended without manual intervention.

I’ve used this rainfall totals application multiple times a day, especially since we’ve been getting some rain in the past couple of weeks (and it’s simple and mobile friendly). I can get the information I need with one click instead of fiddling with a non-mobile map or text data.

And, most exciting, it’s easy for me to release and deploy a new version of my application or new applications using this development and deployment cycle.

Fire Dynamics – Heat Fluxes in FDS

Various heat flux quantities can be used to output the heat flux to walls and surfaces in Fire Dynamics Simulator (FDS). This post will explain the difference between various heat flux output quantities.

The options for outputting heat fluxes (kW/m2) at a point location (on a surface) or boundary (along a wall or surface) are as follows:

  • Radiative heat flux – \(\dot q_{rad}”\)
  • Convective heat flux – \(\dot q_{conv}”\)
  • Net heat flux – \(\dot q_{net}”\)
  • Incident heat flux – \(\dot q_{inc}”\)
  • Gauge heat flux – \(\dot q_{gauge}”\)
  • Radiometer – \(\dot q_{radiometer}”\)
  • Radiative heat flux gas – \(\dot q_{rad}”\)

Radiative Heat Flux

Consider an energy balance on a surface or wall. The net radiative heat flux \(\dot q_{rad}”\) is given by the sum of the incoming (or absorbed) \(\dot q_{rad,in}”\) and outgoing (or reflected) \(\dot q_{rad,out}”\) radiation:

\(\dot q_{rad}” = \dot q_{rad,in}” – \dot q_{rad,out}”\)

where \(\dot q_{rad,out}”\) can be decomposed into

\(\dot q_{rad,out}” = \dot q_{rad,in}” – \varepsilon \sigma T_w^4\)

where \(\varepsilon\) is the emissivity, \(\sigma\) is the Stefan-Boltzmann constant, and \(T_w\) is the wall temperature.

The ‘RADIATIVE HEAT FLUX’ quantity can be used to output the net radiative heat flux to a surface.

Convective Heat Flux

The convective heat flux \(\dot q_{conv}”\) is given by

\(\dot q_{conv}” = h (T_g – T_w)\)

where \(h\) is the heat transfer coefficient, and \(T_g\) is the local gas temperature.

The ‘CONVECTIVE HEAT FLUX’ quantity can be used to output the convective heat flux to a surface.

Net Heat Flux

The net heat flux is the sum of the radiative heat flux and convective heat flux and is given by

\(\dot q_{net}” = \dot q_{rad}” + \dot q_{conv}”\)

The ‘NET HEAT FLUX’ quantity can be used to output the combined radiative and convective heat fluxes to a surface.

Incident Heat Flux

The incident heat flux is a diagnostic output and is the sum of the incoming radiation and convection. It does not include outgoing radiation and is given by

\(\dot q_{inc}” = \dot q_{rad}”/\varepsilon + \sigma T_w^4 + \dot q_{conv}”\)

Substituting in the definition of the net radiative heat flux results in

\(\dot q_{inc}” = (\dot q_{rad,in}” – \dot q_{rad,out}”)/\varepsilon + \sigma T_w^4 + \dot q_{conv}”\)

Expanding the \(\dot q_{rad,out}”\) term results in

\(\dot q_{inc}” = (\dot q_{rad,in}” – \varepsilon \sigma T_w^4)/\varepsilon – \sigma T_w^4 + \sigma T_w^4 + \dot q_{conv}”\)

Expanding the first term and simplifying results in

\(\dot q_{inc}” = \dot q_{rad,in}”/\varepsilon – \sigma T_w^4 + \sigma T_w^4 + \dot q_{conv}”\)

Further simplification results in

\(\dot q_{inc}” = \dot q_{rad,in}”/\varepsilon + \dot q_{conv}”\)

The ‘INCIDENT HEAT FLUX’ quantity can be used as a diagnostic to output the convective and incoming radiative heat fluxes to a surface.

Gauge Heat Flux

The gauge heat flux can be used when comparing experimentally measured heat fluxes for a gauge that is held at a fixed temperature. The gauge heat flux accounts for the incoming and outgoing radiation and convection and adjusts the heat fluxes based on a fixed (specified) wall temperature. The gauge heat flux is given by

\(\dot q_{gauge}” = \dot q_{rad}”/\varepsilon + \dot q_{conv}” + \sigma(T_w^4 – T_G^4) + h(T_w – T_G)\)

You must specify the gauge temperature \(T_G\) for this output.

The ‘GAUGE HEAT FLUX’ quantity can be used when comparing heat flux predictions to experimentally measured heat fluxes for a gauge at a fixed temperature.

Radiometer

The radiometer output quantity is similar to the gauge heat flux output quantity, but convection is neglected, which is given by

\(\dot q_{gauge}” = \dot q_{rad}”/\varepsilon + \sigma(T_w^4 – T_\infty^4)\)

The ‘RADIOMETER’ quantity can be used when comparing heat flux predictions to experimentally measured heat fluxes from a radiometer.

Radiative Heat Flux Gas

The ‘RADIATIVE HEAT FLUX GAS’ quantity is the same as the radiative heat flux \(\dot q_{rad}”\) except this device can be placed away from a solid surface to output the radiative heat flux if a surface was present at the specified location.

Example

An example case can be used to demonstrate the different heat flux output quantities in FDS. The source code for this example can be found on the fire-tools repository.

In this example, a 200 kW propane fire measuring 0.4 m by 0.4 m is placed 0.3 m from a single wall in a domain measuring 1 m by 1 m by 2 m. The grid cells are 10 cm on each side. The wall is divided vertically into two halves: one with thermal properties of gypsum (and an emissivity of 0.5 to exaggerate the results), and another specified as an ‘INERT’ surface with a fixed temperature of 20 °C. The other walls are open to ambient air. Two measurement locations are located on the wall at a height of 0.5 m: one located on the gypsum portion of wall, and one located on the INERT portion of the wall. The following snapshot from Smokeview shows the fire in front of the gypsum wall (left) and inert wall (right).

heat_flux_0423

Six heat flux output quantities were placed at the two measurement locations as follows: net heat flux, convective heat flux, radiative heat flux, incident heat flux, gauge heat flux, and radiometer.

The following snapshot shows the convective heat flux on the wall. If the radiative, net, or convective heat flux quantities are visualized via a boundary file, there is a difference between the heat flux values on the two materials because the gypsum material is warmer than ambient air, which results in a negative convective heat flux (heat transfer from the wall to the air), whereas the inert (cold) wall does not heat up, which results in a positive convective heat flux (heat transfer from the air to the wall):

heat_flux_0500

The following snapshot shows the gauge heat flux on the wall. Note that the incident heat flux and gauge heat flux do not depend on the emissivity or properties of the materials, so the two different wall materials show no difference in the incident and radiative heat flux quantities. In other words, this is the heat flux “incident” upon the material and does not depend on the properties of the material (a similar effect is observed for the ‘WALL TEMPERATURE’ vs. the ‘ADIABATIC SURFACE TEMPERATURE’ output quantities):

heat_flux_0498

The following plot shows the different heat fluxes on the inert wall:

heat_flux_inert

The highest heat fluxes are the incident and gauge heat flux, which are between 10 kW/m2 and 15 kW/m2, followed closely by the radiometer. This is expected because the incident and gauge heat fluxes do not account for radiative losses, the gauge heat flux is the heat flux to an inert (cold) wall, and the wall and gauge temperatures for the inert wall are fixed at 20 °C. The radiative and net heat fluxes are the next highest heat fluxes between 8 kW/m2 and 14 kW/m2, and they are overlapping because the convective heat flux is close to 0 kW/m2.

The following plot shows the different heat fluxes on the gypsum wall:

heat_flux_gypsum

The various heat flux quantities are in the same order as the inert wall. The incident heat flux, gauge heat flux, and radiometer are overlapping from 10 kW/m2 to 15 kW/m2. Because the gypsum material has an emissivity of 0.5, the radiative heat flux is lower (approximately 5 kW/m2) because half of the radiation is reflected away. As the gypsum wall heats up, it begins to transfer heat outwards via convection, hence the negative convective heat flux of approximately -1 kW/m2. Because of the reduced radiative heat flux and the negative heat flux, the net heat flux (sum of the radiative and convective heat fluxes) is less than the inert wall and is approximately 5 kW/m2.

Conclusion

Use the ‘RADIATIVE HEAT FLUX’, ‘CONVECTIVE HEAT FLUX’, or ‘NET HEAT FLUX’ output quantities to obtain the heat flux to a surface that accounts for both incoming and outgoing radiative and convective heat transfer and uses the actual wall temperature in the heat transfer calculations.

Use the ‘GAUGE HEAT FLUX’ or ‘RADIOMETER’ output quantities when comparing to experimental measurements. Specify a gauge temperature when using the gauge heat flux output quantity.

Use the ‘INCIDENT HEAT FLUX’ output quantity as a diagnostic output to check the heat flux value (neglecting radiative losses).

Walls with the default ‘INERT’ boundary condition should not be used in a realistic scenario because they remain at a fixed temperature. They should only be used for diagnostic purposes.

The source code for this heat flux example can be found on the fire-tools repository, including the FDS input file and the Python script to generate the plots.

Forward and Inverse Modeling of Fire Physics Towards Fire Scene Reconstructions

dissertation

My PhD dissertation on “Forward and Inverse Modeling of Fire Physics Towards Fire Scene Reconstructions” has been made available and can be downloaded from:

http://repositories.lib.utexas.edu/handle/2152/21971

Top