Published in December 30, 2020

Python In Production

Setting up production-ready python apps

python-in-productionImage via pexels

You’ve written your program in python and you want it to go live for use. It’ll be great if you could push your app from the development environment to the production server without complications. But unfortunately, deployment does not work that simply as things are known to break in a production environment. In a development environment, everything works fine as you focus only on the functionalities of your app, but a production environment supports different configuration settings and has loose screws. Also, deploying a python app is not much of brainer if the code base supports little to no dependencies, but it gets complicated when the program has a lot of dependencies moving to the production environment.

Below are some guides to follow on how to make your python app production-ready.

1. Use Virtual Environment to Isolate Program.

As I mentioned before, the development environment in most cases will be different from the production environment. For instance, the development environment for a program will be the programmer’s machine (laptop, desktop or tablet). The production environment will be a virtual machine instance (AWS instance, Azure Virtual Machine, or Linode Standard) or a containerized instance(Docker, Kubernetes). To isolate the program in both environments with its Python version and modules, use the Python Virtual Environment in both environments.

The example below shows how to create and activate an isolated environment

# Create virtual environment.
> python -m venv ./env

# Activate virtual environment on Windows
> ./env/script/activate.bat && ./env/scripts/activate

# Activate virtual environment on Mac
.env/bin/activate

Furthermore, required Python packages(with their required versions) can be installed and pushed to a file that’ll hold the packages for the program.

# Install the Scrapy module via pip
(env) > pip install scrapy===2.4.1

# Push package to file
(env) > pip freeze > requirement.txt

The example above installs a specific package with the required version. The package is then pushed to a requirement.txt file using the pip freeze command. The requirement.txt file is used for specifying the Python packages the program requires, while the pip freeze command is used for outputting installed packages names with their correct version.

In a different environment, the virtual environment can be recreated, and the required packages with the required versions specified in the requirement.txt file can be installed. Installing the packages on a Mac is quite easy as it only requires running the pip install command with the -r switch on the requirement.txt file. But on windows, the packages have to installed by copying them from the requirement file list.

# Installing required packages on a Mac
(env)> pip intalled -r requirement.txt

# Installing required packages on a Windows
(env)> pip install scrapy===2.4.1

2. Use Config Files for Defining Deployment Environment.

The deployment environment for an app has to maintain proper configuration for the app to work correctly. The environment is where the app is run either in development or production. An app is said to be in production when it leaves the development stage.

The configuration setting of the production environment is different from that of the development environment. To modify a program on the production server, you’d have to run it on the development environment. The new changes are pushed to production.

The issue with these environments is the differences in configurations. The production environment requires different settings that may be hard to reproduce in the development environment. For instance, you have a program running on a web server with access to external data via API. To modify the code, you’d have to start the webserver container, set up API configuration and the appropriate keys needed to access the external data. These steps are unnecessary and time-consuming if all you want to do is modify a part of your program, then ensure everything works correctly.

A workaround to the above scenario is to modify your program at startup time to provide different functionalities depending on the deployment environment. Have a dedicated configuration for the program on both the development and production environment

# settings.py

# Don't run with 'TESTING' turned on in production
TESTING = True

# API credentials should be kept secrete in production
API_CREDENTIAL = {
    "consumer_key": "XXXXXXXXXXXXXX",
    "consumer_secret": "XXXXXXXXXXXXXX",
    "access_token": "XXXXXXXXXXXXXX",
    "access_secret": "XXXXXXXXXXXXXX",
}

The TESTING constant is set to the Boolean value of True by default. It determines how the app runs on development and production. The API_CREDENTIAL constant is a key-value pair of the required API key.

Depending on the value of the TESTING variable, other modules in the app can import the settings.py file and determine how they implement their attributes.

# main.py

import settings

from typing import List


class TestingAPI:
    """
    Use mock data in development
    """

    # ...

class RealAPI:
    """
    Uses real data via API in production
    """

    def __init__(self, api_credentials: dict) -> None:
        # ...

    # ...


if settings.TESTING:
    api = TestingAPI
else:
    api = RealAPI(settings.API_CREDENTIALS)

With the above example, modules should be customized to perform differently in deployment environments. Doing so makes it easy to skip the unnecessary reproduction code like API or database connection when it’s not needed. Mock API data can be generated and injected into the program when testing or developing.

Another similar instance might be to make the app work differently based OS. Say the host server used in production is a different OS, say Linux and the one for development is a Windows, this might break the app because of the differences in OS type.

The Python sys module should be used to inspect the OS and determine its type.

# main.py

import sys

class LinuxEnv:
    # ...

class WindowsEnv:
    # ...

if sys.platform.startswith('linux'):
    config = LinuxEnv
elif sys.platform.startwith('wind32'):
    config = WindowEnv
else:
    # ...

The print function is for outputting human-readable values. This doesn’t help at all when debugging.

Python provides the repr built-in function to return a printable representation of an object. It can be used in conjunction with the print statement to know and ensure value types when debugging.

# Number
print(repr(1024))

# String
print(repr("1024"))

# printable representation output
>>>
1024 # number
'1024' # string

The same result is reached employing the C-style %r format string, and the % operator.

# Number
print("%r" %1024)

# String
print("%r" %"1024")

# printable representation output
>>>
1024 # number
'1024' # string

Debugging Dynamic Objects

When debugging dynamic objects, using the human-readable function gives the same value as the repr function. That means the print function can be used because using repr on Object instances isn’t helpful.

class Person(object):
    def __init__(self, name, age, height):
        self.name = name
        self.age = age
        self.height = height

obj = Person('John', 25, "6'5")

print(obj)
repr(obj)

>>>
<__main__.Persson object at 0x0000011671395160> # human-readable string
'<__main__.Persson object at 0x0000011671395160>' # object representation

The are two ways to resolve this problem:

  • Use the __repr__ special method.
  • Use object instance dictionary when you don’t control the class.

The __repr__ special method can only be used in classes that you control. It should define and return a string expression of the created object.

class Person(object):
    def __init__(self, name, age, height):
        self.name = name
        self.age = age
        self.height = height

    def __repr__(self):
        return f'Person({self.name}, {self.age}, {self.height})'

obj = Person('John', 25, "6'5")

repr(obj)

>>>
"Person(John, 25, 6'5)"

When you don’t have control over the class, use the __dict__ special attribute to get access to the object instance dictionary. The __dict__ attribute returns a dictionary of class attributes and methods.

class Person(object):
    def __init__(self, name, age, height):
        self.name = name
        self.age = age
        self.height = height

    def __repr__(self):
        return f'Person({self.name}, {self.age}, {self.height})'

obj = Person('John', 25, "6'5")

print(obj__dict__)

>>>
{'name': 'John', 'age': 25, 'height': "6'5"}

4. Use Reusable Components

Write functions that will be reused in other parts of the program to create a flow. For instance, A program that reads data from a data source(API, database, AWS s3), loads model from a pickle file, uses the model to generate predictions based on the dataset, and save the predictions in a database.

To achieve the instance above, the code responsible for handling the process could be divided into components rather than using a single function or class. Each component implies a different process that can be assembled with other components to create a pipeline for the required flow.

def read_data_from_api(args, info):
    #...
    return data

def load_model(info):
    #...
    return model

def run_predictions(data, model):
  #...
    return predictions

def save_predictions_to_db(args, predictions):
  #...


def main():
   """Prediction pipeline"""

   data = load_data_from_api(args, info)
   model = load_model(args)
   Predictions = run_predictions(arg, data, model)
   save_predictions_to_db(args, predictions)
   #...

Benefits of using the component approach:

  • Component can be reused in other pipelines
  • Easy to improve and modify components overtime

In the code sample above, all four components are assembled as pipeline in the mian() function.

5. Test Program with Unittest

Python is a dynamically typed programming language. That means that it doesn’t have a static type checker by default. Not having a static type checker often results in runtime errors.

All programs should be tested regardless of the programming language used, but Python is specifically limited as it has no type checking, at least not by default. Fortunately, the unittest module can be used to test python programs. Python dynamism makes it easy to write to test.

Testing is ensuring good code quality. It gives the programmer assurance that the program will work as expected when deployed. The responsible programmer should always build with testing in mind.

To use the built-in unittest module on your code, it has to be imported in a different python file. For instance, say you have a utility function defined in utils.py.

# utils.py

def to_str(data):
    if isinstance(data, str):
    return data
  elif isinstance(data, bytes):
    return data.decode("utf-8")
  else:
    raise TypeError("Must supply string or byte. Found %r" %data)

To perform a test, you need to create a second file with the word test followed by the name of the file you want to test, which is the utils.py file. So the name of the file will be test.utils.py.

#test_utils.py

from unittest import TestCase, main
from utils import to_str

class UtilsTestCase(TestCase):
    def test_to_str_bytes(self):
        # Verifies equality
      self.assertEqual("hello", to_str(b’hello’))

    def test_to_str_str(self):
        # Verifies boolean expression
      self.assertEqual("hello", to_str(‘hello’))

    def test_to_str_bad(self):
        #verifies exception is raised
      self.assertRaises(TypeError, to_str, object())

if __name__ == "__main__":
    main()

Each test method begins with the word test. If a test method runs without raising any exception, then the test is successful. The tests above are organized according to test cases in the TestCase subclasses. The subclasses include helper methods for making assertion tests, such as assertEqual, assertNotEqual, assertRaises, and assertTrue.

NOTE:

To learn more about the testing and the unittest module, Click this link to the python documentation on testing.

6. Harden Code Interaction to Data Sources.

There is no guarantee that fetching data from a data source will work as expected. Connections to data sources are unpredictable, especially in a production environment. For instance, an error could occur because the destination server is over-loaded with too many requests or a connection from a client to the server could be lost. These instances and others require that your program is resilient enough to handle intermediate interruption and to be able to reconnect to data sources and destinations when there is a failure.

To achieve this, you can implement retries in the connection code. When the connection fails, it’ll try to reconnect certain several times, if the problem persists then it may be an issue that can’t be handled through reconnecting, in this cases, the process should be aborted.

For instance, a connection to a MongoDB database to obtain data with the following code.

#...
mongoDB_connection = pymongo.MongoClient("mongodb://localhost:27017/")

When the operation above is successful, it returns a connection object that is used to get the required data. However, to make it safer in a production environment, it needs to be handle against errors. As stated above, there should be retries when there’s a failure in connection.

The sample code above will try the connection process 10x before aborting.

# Using a while loop:
attempt = 1
while attempt < 10:
    try:
        mongoDB_connection = pymongo.MongoClient("mongodb://localhost:27017/")
        #...
        break
  except MongoDBException as e:
        attempt + 1
        print(f"Attempt: {attempt}, got error: {e}")
        continue

# Using a for loop:
for attempt in range(0, 10):
    try:
        mongoDB_connection = pymongo.MongoClient("mongodb://localhost:27017/")
        #...
        break
  except MongoDBException as e:
        print(f"Attempt: {attempt}, got error: {e}")
        continue

7. Profile Program Before Optimizing

The source of slow-downs in a program can be obscure. Features you might assume are slow are fast, and vice versa. The best way to tell which parts are slow or fast is by profiling your program code. You have to let go of instincts and intuitions if you want to optimize and refactor your code for the better.

Python has a built-in code profiler that measures performance and helps determine which part of the program is responsible for execution. This can help find the source of slow-downs in a program. With the information provided by the profiler, the part of the program responsible for the slow performance can then be refactored for optimization.

For instance, say you want to find out why an insertion sort algorithm is slow.

def insert_value(array, value):
    """Finds the insertion value.
    Inefficient because it uses linear scan over array inputs"""

    for i, exist in enumerate(array):

        if exist > value:
            array.insert(i, value)
            return
    array.append(value)

 def insertion_sort(data):
    """Insertion sort list of data"""
    result = []
    for value in data:
        insert_value(result, value)
    return result

The first function is the core mechanism that finds the insertion point for each peace data. It’s inefficient because it does a linear scan over the array inputs. The second function sorts the list of data by calling the insert_value function.

To profile insertion_sort and insert_value, a set of random data is created and a test function is defined to pass the program code to the profiler

from random import randint

#...

if __name__ == "__main__":
    max_size = 10**2
    data = [randint(0, max_size) for _ in range(max_size)]
    test = lambda: insertion_sort(data)

Python provides a built-in C-extended module called CProfile for profiling. The CProfile module is fast and has minimal impact on the performance of your program.

NOTE:

Python provides two module for profiling, a pure python module(Profile) and C-extended module(CProfile). Visit the Python Profilers documentation to learn more.

To profile a program, you initiate the Profile object from the CProfile module and run the test function through the runcall method.

# ...
from cProfile import Profile

if __name__  == "__main__":
    #...

    profile = Profile()
    profile.runcall(test)

Once the test function passed to the runcall method has finished running, the program performance stats can be pulled using the Stats object, and its various method for sorting profile information. The Stats object is from the pstats module.

# ...
from pstats import Stats

if __name__  == "__main__":
    #...

    stats = Stats(profile)
    stats.strip_dirs()
    stats.sort_stats("cumulative")
    stats.print_stats()

The extracted stats is outputted as a table of data organized by function calls. This is from the time the runcall method was executed.

 20003 function calls in 3.171 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.171    3.171 sort_data_list.py:29(<lambda>)
        1    0.004    0.004    3.171    3.171 sort_data_list.py:18(insertion_sort)
    10000    3.142    0.000    3.166    0.000 sort_data_list.py:7(insert_value)
     9991    0.025    0.000    0.025    0.000 {method 'insert' of 'list' objects}
        9    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Below is a guide to what the profile stats column mean:

  • ncalls: The number of calls to the function during the profiling period.
  • tottime: The number of seconds spent executing the function, excluding time spent executing other functions it calls.
  • tottime percall: The average number of seconds spent in the function each time it was called, excluding time spent executing other functions it calls. This is tottime divided by ncalls.
  • cumtime: The cumulative number of seconds spent executing the function, including time spent in all other functions it calls.
  • cumtime percall: The average number of seconds spent in the function each time it was called, including time spent in all other functions it calls. This is cumtime divided by ncalls.

Inspecting the profile stats above, the biggest use of CPU power in the program is the cumulative time spent in the insert_value function. This can be refactored by using the bisect_left function from the bisect built-in module.

#...
from bisect import bisect_left

def insert_value(array, value):
    """Finds the insertion value.
    Efficient because it uses binary search over array inputs"""
    i = bisect_left(array,  value)
    array.insert(i, value)

The bisect_left function provides an efficient binary search over an array input. The returned value is an index as the insertion point into the sorted array. After running the profiler again on the program, the insert_value function is much accelerated in execution than before. The cumulative time spent is almost 200x faster than the previous execution of the insert_value function.

   30003 function calls in 0.018 seconds
   Ordered by: cumulative time
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.018    0.018 sort_data_list.py:33(<lambda>)
        1    0.003    0.003    0.018    0.018 sort_data_list.py:21(insertion_sort)
    10000    0.005    0.000    0.016    0.000 sort_data_list.py:9(insert_value)
    10000    0.009    0.000    0.009    0.000 {built-in method _bisect.bisect_left}
    10000    0.001    0.000    0.001    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

NOTE:

“When profiling a Python program, be sure that what you’re measuring is the code itself and not any external systems. Beware of functions that access the network or resources on disk. These may appear to have a large impact on your program’s execution time because of the slowness of the underlying systems. If your program uses a cache to mask the latency of slow resources like these, you should also ensure that it’s properly warmed up before you start profiling.”

source: Effective Python by Brett Slatkin

Summary

Let’s look at the steps we’ve covered so far:

  • Use virtual environment.
  • Use config settings to difine deployment environment.
  • Debug using the repr built-in function
  • Use reusable component
  • Test program with the unittest module
  • Harden code interaction to data sources
  • Profile program before optimizing

Although these steps are specific to Python, the ideas apply to other programming languages.

If you found this article useful, then don’t forget to share with others that may benefit from it.


Hi, my name is Romeo Peter. I'm a self-taught software developer and blogger --- I build web application, and automate business processes. I don't know it all, but what I do know, I manage to share for others to learn.

Follow me on Twitter. I share my thoughts there.

I'm available for work. Contact me to work on a project or work with you. I usually get back immediately.