So far in this series on queueing theory, we’ve seen single server queues, bounded queues, multi-server queues, and most recently queue networks. A fascinating result from queueing theory is that wait time degrades significantly as utilisation tends towards 100%. We saw that queues, which are unbounded, have degenerate behaviour under heavy load when utilisation hits dangerous levels.

Perhaps more interestingly, we saw that bounded queues limit
customer ingress to prevent this undesirable behaviour; in essence we learned
that we can reject some customers to ensure good service for those who *do* make
it into the system. Those customers who get rejected might not be entirely
happy with this arrangement, so the question is: can we do better?

Over the course of the next two entries, I want to dig deeper into the internals
of queuing models so that we can explore sophisticated ways to better capture
how modern systems behave. The ultimate goal of this series is to learn how we
can build models of reactive systems, paying particular attention to how we
model **back pressure** in those systems.

A reactive system, using back pressure, signals to its source of traffic when it’s ready for more customers. At first glance, this might sound incredibly impractical, but in practice back pressure works for any system that is completely in control of both the traffic source and the processing service.

For a typical service-based architecture, back pressure is the perfect mechanism
for regulating traffic flow between services. One can imagine that as services
*pull* more traffic in from their upstream provider, this *pull* propagates
towards the outside of the system until finally it hits the boundary where
traffic is coming from the outside world. Even here, back pressure can be
applied to some level. Both TCP and HTTP traffic can be limited to a certain
degree with back-pressure. Eventually though, back pressure will no longer
suffice to limit resource usage and we’ll need to start dropping customers.

In practice then, a reactive system bounds all in-flight processing, but uses back pressure to regulate the amount of in-flight work, and thus, reduce the number of cases where work must be rejected. We can model queues with back-pressure by replacing the Poisson arrival process used by all queues with something more sophisticated.

Before we look at other arrival processes though, we should first ensure that we really understand how a simple queue, like an queue, really functions. In particular, we are interested in analysing our queue as a particular type of continuous time Markov chain called a birth-death process.

## A Quick Recap

Before we proceed, let’s remind ourselves of the basics of queue models. Arrivals into the queue are modelled as a Poisson process where the arrival rate is designated . Service times have rate and are exponentially-distributed with mean service time of .

The ratio of arrival to service completion is denoted . For unbounded queues, ensures that the queue is stable, if , then both queue size and latency tend towards infinity.

## Markov Chains in Two Minutes

A Markov chain is a random process described by states and the transitions
between those states. Transitions between states are probabilistic and exhibit a
property called *memorylessness*. The memorylessness property ensures that the
probability distribution for the next state depends only on the current state.
Put another way, the history of a Markov process is unimportant when considering
what the next transition will be.

The diagram above shows a simple Markov chain with three states: *in bed*, *at
the gym* and *at work*. The transitions between each state to the next state are
labelled with the respective probabilities. For example, the probability of
going from *in bed* to *at work* is 30%. Note also, that the probability of
remaining *in bed* is 20%; there’s no requirement that we actually leave the
current state.

We can represent these transition probabilities using a transition probability matrix :

The probability of moving from state to state is given by
. Each row in the matrix must sum to indicating that the
probability of doing *something* when in a given state is always .

This kind of Markov chain is called a *discrete-time Markov chain* (DTMC), where
the time parameter is discrete and the state changes randomly between each
discrete step in the process. The models we’ve seen so far have a continuous
time parameter resulting in *continuous-time Markov chains* (CTMC).

We can recast our discrete-time process as a continuous-time process. We use a
slightly different representation for our continous-time chains. Rather than
modelling the transition probabilities, we model the *transition rates*:

Note that we omit rates for staying in the same state: it makes little sense to talk about the rate at which a process remains stationary. Just as we used a transition probability matrix for the discrete-time chain, we use a transition rate matrix for the continuous-time chain:

Here, is the *rate* of transition from state to state .
Diagonals () are constructed such that each row equals unlike
the diagonals for the transition probability matrix, which ensure that each row
equals . The diagram and the matrix show that our continuous-time chain
moves from the *in bed* state to the *at the gym* state () with rate
.

## Poisson Processes

Now we understand how to construct continuous-time Markov chains we can explore Markovian queues in more detail. Recall that for an queue, both arrivals and service times are Poisson processes, that is they are both stochastic processes with Poisson distribution.

We can model a Poisson process, and thus the arrivals and service processes, as a CTMC where each state in the chain corresponds to a given population size. Consider the arrivals process in an queue. We know that arrivals are a Poisson process with rate . At the start of the process, there have been no arrivals. Thi first arrival occurs with rate , so to second, the third and so on for as long as the process continues. We can model this as a Markov chain where the states correspond to the arrivals count:

When we translate this into a transition rate matrix we get:

This matrix continues unbounded since the number of arrivals is effectively unbounded.

## Birth-Death Processes

An queue is composed of two Poisson processes working in tandem: the
arrivals process and the service process. As we saw, each of these processes can
be described by a Markov chain. We can go further and describe the queue as a
whole using a special kind of Markov chain process called a **birth-death
process**. Birth-death processes are processes where the states represent the
population count and transitions correspond to either **births**, which
increment the population count by one, or **deaths** which decrease the
population count by one. Note that Poisson processes are themselves birth-death
processes, just with zero deaths.

This diagram shows the Markov chain for an queue with arrival rate and service rate . As you can see, the population state increases as customers arrive at the queue and decreases as customers are served. We can translate this simple diagram into a transition rate matrix for the queue:

When the process starts, the only possible transition is from zero customers to one with rate (). After this, at each state, the process can transition to having one more customer, again at rate or to having one fewer customer with rate .

## Steady-State Probabilities

With the transition rate matrix in hand, we can calculate the steady-state probabilities for the queue. Recall that the steady-state probabilities tell us the probability of the queue being in state , that is the probability of having customers in the system. More formally:

Where is the probability of having customers in the system at time . Note that the steady-state probabilities are time-independent and, as the name implies, steady. More precisely, we expect that:

That is, we expect the rate of change of the probabilities to be zero in the limit. Let’s think about for a while. The transition rate matrix tells us how the process flows between states. We can see that each state can be entered from states and state . Entry from state corresponds to a customer arriving in the system and has rate . Entry from state corresponds to a customer completing service and leaving the system with rate .

Each state can also exit to states and as customers are served (with rate ) and arrive (with rate ). This gives us:

Using our limit condition we find these steady-state flow equations:

Solving this recurrence relation with dependence on gives us:

Since we know that all probabilites must sum to we can derive :

## Coming full circle

You might recall that, in my first post in this series, I mentioned that the equation for the mean number of customers in an queue follows from the steady-state probabilities. Let’s see how that works. The mean number of customers for an queue is:

To get here from the steady-state probabilities let’s start by simply defining in terms of :

We’re saying that the mean numbers of customers is simply the sum of each possible value adjusted by its probability. Let’s expand on this:

We know that queues have divergent behaviour if , and indeed the series , only converges for . So, assuming we have (otherwise is undefined):

And thus we arrive at the definition for , the mean customers in the queue for queues.

## What’s next?

With an understanding of how Markov chains are used to construct queue models, we can start looking at some more complex models. In particular, the next post in this series will introduce Markov-modulated Arrival Processes (MMAP). An MMAP composes two or more Markov arrival processes and switches between them. The switching is itself modelled as a Markov chain. MMAPs are a great way of creating a rudimentary model of how back-pressure works.