Thinking about Application Performance

The following article is taken from book: Patterns of Enterprise Application Architecture


patterns of enterprise application architecture
Patterns of Enterprise Application Architecture

Thinking about Application performance:

One of the most important metrics in success of an application is performance. If an application has a lot of functionality but if it has low performance then it would be a failure. An application would be considered successful only if it has good functionality as well as good performance. In this post my focus is on Enterprise applications. Performance of an enterprise application can be measured in terms of many metrics.  I am writing about some of the performance metrics here.

  1. Response time
  2. Responsiveness
  3. Latency
  4. Throughput
  5. Load
  6. Load Sensitivity
  7. Efficiency
  8. Capacity
  9. Scalability

Response Time

It is the amount of time that a system requires to process a request. This request can be an UI action like pressing a button.


It is measured in terms of acknowledgement time. In contrast with response time it is the amount of time system takes to acknowledge a user about the status of his request.

You might have seen progress bars in web browsers or in application installers. These kinds of elements are used to acknowledge user about the status of his request.

If a system has low responsiveness then user might get frustrated.


It is the minimum time to get any kind of response. This time may be due many reasons. It may be due to network or due to processing delays etc.

As system architect we can’t remove latency but as a thumb rule we can reduce our remote calls or IOs to improve system performance.


It is defined as the amount of tasks completed by a system in a given span of time. Throughput can have different units in different scenarios like in case of data transfer its unit will be bytes per second. While we are talking about enterprise applications we measure throughput in terms of transactions per second (tps). For your particular system you should pick a common set of transactions.

In this terminology performance is either throughput or response time—whichever matters more to you. It can sometimes be difficult to talk about performance when a technique improves throughput but decreases response time, so it’s best to use the more precise term. From a user’s perspective responsiveness may be more important than response time, so improving responsiveness at a cost of response time or throughput will increase performance.


It is a statement of how much stress a system is under, which might be measured in how many users are currently connected to it. The load is usually a context for some other measurement, such as a response time. Thus, you may say that the response time for some request is 0.5 seconds with 10 users and 2 seconds with 20 users.

Load sensitivity

It is an expression of how the response time varies with the load. Let’s say that system A has a response time of 0.5 seconds for 10 through 20 users and system B has a response time of 0.2 seconds for 10 users that rises to 2 seconds for 20 users. In this case system A has a lower load sensitivity than system B. We might also use the term degradation to say that system B degrades more than system A.


It is performance divided by resources. A system that gets 30 tps on two CPUs is more efficient than a system that gets 40 tps on four identical CPUs.


The capacity of a system is an indication of maximum effective throughput or load. This might be an absolute maximum or a point at which the performance dips below an acceptable threshold.


It is a measure of how adding resources (usually hardware) affects performance. A scalable system is one that allows you to add hardware and get a commensurate performance improvement, such as doubling how many servers you have to double your throughput. Vertical scalability, or scaling up, means adding more power to a single server, such as more memory. Horizontal scalability, or scaling out, means adding more servers.
The problem here is that design decisions don’t affect all of these performance factors equally. Say we have two software systems running on a server: Swordfish’s capacity is 20 tps while Camel’s capacity is 40 tps. Which has better performance? Which is more scalable? We can’t answer the scalability question from this data, and we can only say that Camel is more efficient on a single server. If we add another server, we notice that swordfish now handles 35 tps and camel handles 50 tps. Camel’s capacity is still better, but Swordfish looks like it may scale out better. If we continue adding servers we’ll discover that Swordfish gets 15 tps per extra server and Camel gets 10. Given this data we can say that Swordfish has better horizontal scalability, even though Camel is more efficient for less than five servers.

When building enterprise systems, it often makes sense to build for hardware scalability rather than capacity or even efficiency. Scalability gives you the option of better performance if you need it. Scalability can also be easier to do. Often designers do complicated things that improve the capacity on a particular hardware platform when it might actually be cheaper to buy more hardware. If Camel has a greater cost than Swordfish, and that greater cost is equivalent to a couple of servers, then Swordfish ends up being cheaper even if you only need 40 tps.