Application Server Optimization

For the past few weeks, I have been working on improving our system.
The improvements involve metrics related to latency. In this article, I am sharing some recipes that I found helpful in my work.

This article will be Python-Django specific. I think it should be generally applicable to other similar systems.

Disclaimer: These are not facts. Because of inherent complexity and unpredictability of distributed systems, we cannot make specific theoretical predictions about their performance characteristics. If you want to apply these findings to your work, make measurements first.

Let me define a couple of terms first

  • By API I mean inter-modular boundaries.
  • By backend services I mean the software that app server talks to. For example, Postgres, Redis or Elastic search. It excludes app servers themselves.

Application structure

An application structure that I found helpful in this work is

Data layer

This layer gets data and functionality from backend services and exposes
an API to other layers. The individual API calls perform these things

  • Making service calls to backend services and get data.
  • Construct some data representation for other layers.

It should be invisible to users of the API what is behind it. For example, "user.get" should not give any clues whether the data came from disk or memory. The advantages are

  • We can change implementation and data access patterns without altering the API.
  • We can treat it as a single source of particular data. For example, cart object will contain all the behaviors of carts in our system.

Presentation layer

This layer uses data API to create a response to be sent back to client. Presentation layer never makes direct requests to backend services.


Now, let's get started with recipes. In this article, we will focus on data layer. We will talk about optimizing presentation layer in a future article. For illustration, I will consider this problem

We have Cart, User and Address models. The users and addresses are related via UserAddressMapping. We will assume that our hypothetical user has seven addresses.

Measure first

Measurements are absolutely critical if you want to make any progress. This simple point gets overlooked in crunch times or with urgent business requirements. Optimizing without measurements is like solving a problem you don't know anything about.

A stab at steps we can take for measurements are

  • Determine metrics you want to optimize. It can be latency, cache misses or open connections.
  • Make measurements on the metrics decided in first step. Profile the hell out of it.
  • Determine actions needed to achieve the desired metric. These often would be straightforward. For example, for minimizing TCP connections to database, minimize queries to database.

For instance, my metric is latency and I want to minimize it. The key
is to measure execution time of individual requests. It would be nice if this leads to decrease in number of connections to backend but the priority is execution time.

Minimize queries

Minimizing number of queries to backend leads to

  • Reduced number of connections and hence decrease in memory consumption of backend due to decrease in number of file descriptors for open socket connections.
  • Reduced number of forked processes if the server uses pre-fork model, which in turn leads to decrease in CPU utilization. A good example is Postgres .

However, this step may lead to more complex individual queries. How do you make sure it is the right tradeoff? Measurement! For disk-based storage like databases, make sure the number of disk seeks are same or decreasing (It may help if you know whether your database uses magnetic storage or solid storage).

If you are using Django, read about select-related and prefetch-related calls in query set API. If you are using Postgres, read how data is laid out on disk and how queries are executed.

For example, suppose we want to get all the addresses of the user associated with a cart. The naive way to do it would be the following.

cart = Cart.objects.get(id=cart_id)
owner = cart.owner
mappings = UserAddressMapping.objects.filter(user=owner)
addresses = [mapping.address.get_dict_representation()
             for mapping in mappings]
return addresses

Execution Time in database machine: 17ms
Number of queries: 10.

There is one query to get cart, one query to get cart user, one join query between UserAddressMapping and Address and seven individual queries to get seven addresses.

Let's try to do it in a smarter way.

cart = Cart.objects.get(id=cart_id)
owner_id = cart.owner_id
mappings = UserAddressMapping.objects.filter(user_id=owner_id)
mappings = mappings.select_related('address')
addresses = [mapping.address.get_dict_representation()
             for mapping in mappings]
return addresses

Execution time in database machine: 6 ms
Number of queries: 2
Improvement in execution time: 10ms
Decrease in number of queries: 8

We avoided the query to get user (we just need user id to match it with "owner_id"). We also got all the addresses in one query using "select_related" on UserAddressMapping.

This is huge considering the fact that in web development, every millisecond is critical (a rule of thumb is every request should be served in less than 200 ms).

Divide out tasks between backend and application

You should determine what tasks should be done in application process and what in backend. Know the capabilities of your backend and your application platform (my hunch is all "data-processing" tasks should be done in backend services, but I am reluctant to make it a general recipe).

For example, suppose we want to count all the active carts of a user. Let's suppose our hypothetical user have 37 carts. If we do it in application code, it would be something like this


Execution time in application server: 4.8ms

In this code, the application has to serialize all the incoming data to python objects. The better way would be


Execution time in application server: 1.7ms
Improvement in execution time: 3ms

Execute requests for data in parallel

The individual backend calls can be made from separate threads. The big caveat here is it must only involve calls to backend and should not involve any CPU work like serialization. The reason is python GIL prevents multiple threads to execute in separate processors. However, I/O requests can be made from separate threads since what you are essentially doing is waiting. See this stackoverflow answer for a discussion on this topic.

So, if what you are doing is CPU intensive, it may be a net lose to employ threads, considering how hard it is to do in practice.

There are some ways to sidestep GIL in python. We will talk about them in some future article if we happen to use them in production.

For details regarding this issue, have a look at this excellent talk by David Beazley.

Let us try to count number of carts of two users

count1 = GrCart.objects.filter(owner_id=5813).count()
count2 = GrCart.objects.filter(owner_id=5814).count()

Execution time in application server: 3.4ms

If we use threads

pool = ThreadPool(processes=2)
target1 = GrCart.objects.filter(owner_id=5813).count
target2 = GrCart.objects.filter(owner_id=5814).count
count1_async = pool.apply_async(target1, args=())
count2_async = pool.apply_async(target2, args=())

Execution time in application server excluding creation of thread pool: 2.5ms
Improvement in execution time: 1ms

More resources

We cannot comprehensively discuss such a big topic in a small space. It involves lots of concepts from Operating Systems, Computer Networks and Compilers. But I hope it gives you a good starting point on what to do next. For details, have a look at these blogs and books.