Inside Queue Models: Markov Chains

So far in this series on queueing theory, we’ve seen single server queues, bounded queues, multi-server queues, and most recently queue networks. A fascinating result from queueing theory is that wait time degrades significantly as utilisation tends towards 100%. We saw that $M/M/c$ queues, which are unbounded, have degenerate behaviour under heavy load when utilisation hits dangerous levels.

Queue Networks

In my previous post I presented the $M/M/c$ queue as a model for multi-server architectures. As discussed at the end of that post, the $M/M/c$ model has two main drawbacks: each server must have the same service rate, and there’s no mechanism for modelling the overhead of routing between servers. Modelling a multi-server system using a single queue - even a queue with multiple servers - ignores important real-world system characteristics. In this post, I’ll explain how we can arrange queues into networks that capture the cost of routing and allow for servers with different service rates.

Modelling Multi-Server Queues

A few questions seem to come up again and again from the people who’ve been reading my posts on queue theory. Perhaps, the most common question is: “How do I model multi-server applications using queues?”. This in an excellent question since most of us will be running production systems with more than one server, be that multiple collaborating services or just a simple load-balanced service that has a few servers sharing the same incoming queue of customers.

Reject Them or Make Them Wait?

After showing my previous post around at work, a colleague responded with this article in which the author compares the performance of a Java EE application running on Windows and on Linux. When running on Linux, the application exhibits the performance characteristics outlined in my post: at high utilisation, latency grows uncontrollably. What might be surprising however is that on Windows, latency doesn’t change much at all, even at high utilisation. Does this mean that the results we saw for $M/M/1$ queues are wrong? Not quite! Whereas the Linux results show increased latency at high utilisation, the Windows results show an increased error count; at high utilisation Windows is simply dropping connections and kicking waiting customers out of the queue.

Relating Service Utilisation to Latency

At Skipjaq, we are interested in how applications perform as they approach the maximum sustainable load. We don’t want to completely saturate an application so it falls over, but we also don’t want to under-load the application and miss out on true performance numbers. In particular, we are interested in finding points in the load where latencies are on the precipice of moving outside acceptable limits.