Managing the Effect of Slow Back-end Systems


API deployments need to maintain a stable network behaviour, with widely varying back end systems. In a previous article, I talked about the need to maintain user experience. In a Layer7  API Gateway based deployment, back end systems that have high latency have some very interesting side effects.

To effectively manage a gateway in a production environment, you must manage maximum latency and concurrency and transaction rate so that you can set reasonable expectations throughout your organization. In many ways, the product puts control of this in directly in the hands of the policy author for an API Gateway.

Understand the latency/concurrency/TPS relationship

On any system that handles multiple concurrent requests, the Maximum Transactions Per Second (TPS) = Maximum Concurrency/Average Latency. 

  • Concurrency is the size of a thread pool. 
  • In factory configuration, the concurrency of the standard message processing thread pool per gateway is 500. This is version dependent, please check the documentation for your version for the io.httpCoreConcurrency cluster property
  • You can create additional thread pools and increase the size of the main pool.  

As an example, with our standard configuration, given an average back end latency of 1 second, the maximum TPS is 500. If the latency is 3.5 seconds, the maximum TPS is 140. 

This means that given our standard tuning, and a slow back end API that provides a 3.5 second average response time, any more than 140 transactions per second per gateway to the slow API will result in an eventual denial of service of the entire set of published APIs, because as time goes on, more and more requests will end up in a state where they are waiting to be serviced because all of the other threads are tied up waiting for the slow back end.

Tools to manage concurrency

Understand your TPS

Often, APIs that are slow are also supporting significant business functions. This might be the difference between displaying an account status and transferring money. 

Obviously a customer will look at their account status far more often than transferring money. The API for transferring money might be far slower than account data because of the multi-party transaction needed. 

These will often be the “end” of a long list of interactions and represent a big value. These kinds of APIs are usually less actual transaction rates than things that are read-only and enable a UI to display status.  

As such, the “slow” API might also be very low real TPS. It’s best to understand the required TPS for each API. 

But it is impossible to predict the overall concurrency needed without knowing the latency. 

How to Measure Latency

There are 3 main ways: With Dashboarding, with Analytics and with Customized reporting policy via the latency magic context variables: Service/Policy Context Variables

In both the Influx/Grafana and the Policy manager Dashboard, Front End and Back End latencies are displayed: Gateway Dashboard, Configure Gateway for External Service Metrics

Customized policy and external systems like ELK and Splunk can be used as well. We recommend that the externalized methods be used as they can be correlated with similar systems measuring back ends directly.

Manage the policy for APIs

For instance, well written API policy in the product will prevent a denial of service to a back end system. Checking authorization / authentication tokens first is our standard behavior. Failure of A&A would not involve the potentially slow back end system at all. 

You can, optionally, drop a connection immediately instead of continuing policy operation on the detection of a message that has failed authorization/authentication, etc. This immediately reduces consumed concurrency from a denial of service. Customize Error Response Assertion.

With policy logic, this customized error response is also used to manage the message presented to customers during concurrency limiting.

Rate Limit Assertion has a concurrency setting

By choosing a high TPS, but relatively low concurrency, you can manage how much this individual API will consume of the default of 500 concurrent connections that a single gateway allows. 
Click here for an example using Layer7

Manage Listen Ports allows you to do two different things that help manage behavior

This is an “advanced setting” that will remove part of the queuing effect that is sometimes noted when needed concurrency exceeds the available concurrency. 

Create an additional Listener Port with a limited size private thread pool

This is a common way to have some of the effects of a single slow API affecting the rest of the published APIs. This is an alternative to the rate limit assertion. 

Tools to manage TPS

As above, the Rate limit assertion is the tool to limit individual TPS consumed by an individual APIs
Click here for an example of managing TPS in Layer7

Tools to manage Latency

Http routing assertion allows you to set maximum latency.
Click here to see an example of setting maximum latency in Layer7

Specifically, the connection timeout and the read timeout values should be set for something that reflects the average back end time responses. 

It’s imperative that given the realities of what a slow back end can do to your overall system responsiveness, you need to actively manage them. In my view, as described in the PAER posting, shorter timeouts lead to a much better customer perceived experience. 

In the documentation, there’s a specific item that speaks to read timeout:

Tip: The Read Timeout should not exceed the SLA that is defined for your service. If a SLA is not defined, one should be created and communicated to your clients. If the client is expecting a response from the Gateway within 2000ms, the Read Timeout should be approximately 1500ms. It should never be greater than the SLA for the service, otherwise the Gateway would potentially fail the SLA if or when the backend service is unavailable.


In conclusion, to provide a good API based system overall user experience, the first step is understanding your latencies and the relationship that has with concurrency and transaction rate. In general if there are latencies measured in tens of milliseconds, most problems are solvable. If  your latencies are measured in multiple seconds or tens of seconds, the first thing you need to pay attention to is how that is affecting your overall availability of other published services.  

Recent Posts