The Art of Queueing – Simulated

[Reading Time: 30mn]

I don’t know a lot of people who enjoy queueing. But queueing is a fact of life: each time we are competing for access to some limited resource, the natural tendency is to form queues: taxi spots, social security counters, Eiffel Tower entrances, Disneyland latest attraction, saturdays’ over-crowded supermarket tills, annual iPhone launch day at any Apple Store. The list goes on and on.

Last week I was standing in a queue which had formed in front of 2 tube tickets distribution machines at the St Lazare station in Paris, France. The line was short but accumulating. Not a good sign but the Metro hall at St Lazare is huge: you can fill many more people before promiscuity becomes compromising. And in any case, my turn was close.

Suddenly, a nervous young man arrived and, instead of taking his turn in the line, passed everybody and presented himself before one of the machines, while speaking loudly as to intimidate people: “Hey you all! There are 2 machines, why don’t you wait in 2 lines?”. If some qualify queueing as an art, jumping queues might be another one.
Anyway, someone in the queue objected: “Excuse me Sir, but there is one waiting line before these 2 machines, would you please take your turn and not overtake everyone?
– Well I see you don’t understand”, replied the aggressive man turning around. “I said there are 2 machines, so there should be 2 queues. Isn’t that simple enough?” And he turned back his attention to the machine he was about to start using.
– There are several aspects here”, objected another queuer. “But the social fact is that everybody has been waiting for their turn and you should do the same”.

At that point, the overtaker turned again and came closer to us. Still on the physical intimidation trend, he said: “Look, I am going to remove my sun glasses and repeat it to you all very slowly in the eyes as I see you don’t understand french very well. There are 2 machines serving tickets, so there should be 2 queues. You understand? If there were 2 queues, it would go faster! That’s easy to understand nope? 2 better than 1? Got it?
– Wrong!” said someone else. “Actually, this will be exactly the same, there is no difference having 1 or 2 queues!”
– Nonsensical!” said yet another voice from the queue. “We should have 2 queues and chose the smaller! I’m typically good at that!”

Then we all laughed. It was turning into a good joke, the kind of semi-logical riddle youngsters exchange at school. But the aggressive guy was taken aback – he wasn’t joking, he was profoundly convinced of his maths and opinion. I left the guy wondering whether we were all dumb and proceeded to the machine to get the tube tickets I was waiting for.

2 better than 1?

Riding my trip downtown, I was thinking about this encounter and was a bit puzzled. Queueing is a very common life habit. As such, I would expect an almost unconscious decision making process in these occasions. But if my earlier experience was any indication, then no, that wasn’t so obvious.
For a pretty common experience of queueing, we had actually three different opinions as to what system would be more efficient: one queue per machine with balanced allocation (let’s call this one the parallel “round-robin” option), one queue per machine with people choosing the smallest upon arrival (the parallel “smallest” option), and one common queue for all machines (the “series” options). As pictured below.

Queue Simulation Diagram_0

Queue Simulation Diagram_1

But what actually is the question we have? There are several, depending on the viewpoint but I think we can look at the following two:
– Which queue model is more efficient in terms of machine usage?
– Which queue model is more efficient in terms of customer waiting time?
It turns out, a specific branch of mathematics (probability even) is actually devoted to studying waiting lines or queues and is called Queueing Theory.

What does the theory say

The theory is about constructing models in order to predict the queue lengths and waiting times in order to make the right economical decisions regarding efficiency: should we add some more machines to serve tickets more rapidly, or span their usage more efficiently so that they don’t sit idle too much, or reduce the capacity (number of people allowed to enter the system), etc, etc…

It all started with the telecommunications industry in the early 20th century. Agner Karup Erlang,[1] a Danish engineer working for the Copenhagen Telephone Exchange published the basis of queueing theory in 1909, using prior probabilistic work from Andrey Markov. He wanted to model the number of phone calls arriving at an exchange, with the following Kendall’s notation[2] M/D/1 explained as follows.

– Call arrivals follow a Poisson process,[3] which is called memoryless or Markovian (M)
– Service times are fixed or deterministic (D)
– There is a single server (1).

The model he solved was further expanded to cover more specific models. Here are a few of them:

– M/M/1: Arrivals as a Poisson process / Service Jobs as a Poisson process as well / 1 server.
– M/G/1: Arrivals as a Poisson process / Service Jobs as an arbitrary distribution / 1 server.
– G/G/1: Arrivals as an arbitrary distribution / Service Jobs as an arbitrary (but different) distribution / 1 server.

Of course, the above-mentioned models can support more than one servers, in which case they become M/M/c, M/G/c and G/G/c, again as per Kendall’s notation.

Anything missing?

Well, if you look at the succinct recap of the theory, you’ll see that there’s no mention of a number of waiting lines. By default, it is assumed that each server has its queue (also called mailbox), and client might have specific strategies to allocate themselves to the servers. But there’s no real mention of a number of lines or rather, of having just one waiting line as opposed to one per server.
It turns out that some mathematicians have spent some time on the question, and it is not that obvious. According to B.W. Conolly,[4] there is no significant differences in an M/M/2 system with one queue per server as opposed to a M/M/2 system with one common queue – except for server occupation and fairness to the customers (more on this later). But a few years later, J.P.C. Blanc[5] studied the question further, and found that there are indeed differences in favour of a common waiting line. These differences tend to increase when the number of servers increase and the service rate becomes unbalanced.
Why not trying to simulate a system with the same conditions in clients arrivals and process distribution, and just vary in our queue hypotheses? We’ll be able to compare the results and see which system is more efficient.
But before we jump in some code to build the simulation, we need some more requirements.

Tube Station Models

Let’s recap what we need and what we have (simplifying as much as we can):

– Customers arrive at the Tube Station following a Poisson process, let’s say with an average arrival rate of two clients per minute (it’s a busy station).
– There are 2 working machines. They don’t have a lead time to start servicing new clients. Just like the clients arrival scheme, the duration of the service follows a Poisson random distribution with an average processing time of 90 seconds. This means that some clients will take far longer than others in order to complete their purchase – a good approximation of real-life experience.

So far, it means we are in the M/M/2 case. But we do have other characteristics:

Queue dispatching: this describes our 3 options:

– Series means that we have only one common queue and clients will go to the next available machine in order. This one is the one I joined this morning at the Tube Station.
– Parallel “smallest” means we have one queue per server, and clients choose the smaller queue when they arrive. This one is probably what the aggressive man had in mind.
– Parallel “round-robin” means we have one queue per server and that clients will pick-up the other queue in circular order. I don’t think this one is really practical but it was mentioned by someone in the waiting line this morning so we’ll study it as well.

Queue discipline is FIFO (first in, first out, a.k.a first come, first served). There are other theoretical possibilities like LIFO (last one, first out) or SIRO (served in random order) but they don’t apply to out case.
Client discipline: In order to simplify, we’ll consider that no balking is allowed (client not joining when the queue is too long), neither is reneging (clients leaving when they’re fed up with waiting), and neither is jockeying (jumping queues to be served more rapidly). In the parallel scenario, we’ll consider that upon arriving, clients will chose the smaller queue, and then won’t jump to the other one. All disciplined albeit uncommon maybe.

The Simulation: Scala and Akka

The domain has been clarified, so we can now design a computer-based simulation and test our three hypotheses.

Building a queueing system on a single machine involves multi-threading, and multi-threaded programming is probably one of the most arduous tasks a developer can face.
Turns out, it depends on the language and framework used. The usual curly-braces languages (C, C++, Java, C#) don’t really offer any help: without a dedicated thread-management framework, you need to take care of the thread pools and manage share state yourself using some kind of locks. So instead of actually focusing on the domain, you’ll focus on multi-threaded programming management.

Let’s turn to a more functional approach and use the Actor concept made popular with the Erlang language. We’ll use the implementation which comes with the Akka[6] framework and program it using the Scala[7] language.
As a quick reminder, Scala is a multi-paradigm JVM language supporting both Object-Oriented and Functional designs. Akka is a JVM-based framework dedicated to building concurrent and distributed applications. Scala used to have its own Actors implementation, but it will be phased out in favour’s of Akka.

I recommend you have an in-depth look at the related documentation and numerous examples. Briefly, using Actors to build your concurrent program allows you to drop the pseudo imperative / sequential paradigm which proves so difficult in a multi-threaded context. Rather, you define responsibilities and messages: actors are simple units which encapsulate some logic triggered by the reception of specific messages. In turn, they can make some computation, or delegate it to some other actors, and send results back to their parent. Decoupling is reinforced by the fact that the messages exchanged are asynchronous: you send a message to an actor and go on with your other occupations. Of course, this kind of programming is made even more natural using functional constructs (higher-order functions, immutable values, collections and sequences, etc…) so it is particularly well-suited to the Scala language and more specifically the functionally-oriented part of it (some might say the best part of it).

The Design

Here’s a quick design diagram of the implementation:

Queue Simulation Diagram_2

The most important artefacts are actually encapsulated inside Akka, and this is the ActorSystem. Here’s how our simulation uses it:

The QueueSimul object is our starting point (extends App). It starts the ActorSystem, kicks off a TubeStation actor, sends it a “Start” message and then goes to sleep for 10 minutes (configurable). When it wakes up, it sends a “Stop” message to the TubeStation actor, thereby initiating a shutdown of the ActorSystem, and the end of the program. What this means is that our simulation will actually create new clients and send them to be served by ticket machines during 10 minutes, after which no new customer will be created, but existing ones will be processed. When the last client will have been served, the program will actually cease execution.

The TubeStation object is an actor managed by ActorSystem (extends Actor). As you will see in the code, we use logging extensively so all Actor instances also extend ActorLogging to report on their internal condition during execution.
The entry point into TubeStation is the “Start” message sent by its parent which triggers 2 things: 1. Creation of a router of TicketMachine actors and 2. Starting a timer to wait for a new customer arrival. Let’s discuss these points in some more detail.
1. Router: actually, we won’t just create one TicketMachine. Using the configuration file, you can actually activate as many ticket machines as you see fit. When you do so, you create an implicit router giving it 3 pieces of information: the kind of actor it will manage for you, how many of them, and what kind of router you want. Akka offers several kinds of router, but we have experimented with mainly 3 of them: round-robin (router will distribute messages over its actors the one after the other, circularly), smallest-mailbox (whereby the next message will be sent to the actor with the smallest mailbox) and balancing-pool (there isn’t one queue per actor instance any longer: the router manages one single queue).
Other than that, you don’t really manage a router. You just send it the messages you would send to TicketMachine. The router will distribute them to the TicketMachine actors it has created, and it will forward the return messages they sent back to you.
This Router concept is the part which will help us test our various hypotheses. Using smallest-mailbox or round-robin, we simulate one queue per server or here, per ticket machine. Using balancing-pool, we can simulate the usage of one common or single queue: the router will keep messages sent by the parent object in a temporary mailbox, and distribute them to TicketMachine actors as they become available: at any one time, the TicketMachine actors will only contain one message, which they’ll process right away. The burden of managing awaiting messages – or customers – will be managed by the router itself.
2. Customer arrival. This is a “self” message which TubeStation sends to itself each time the timer ticks. And the interval for the next timer is based on a Poisson process. This one can be generated dynamically (based on a configurable average rate arrival), or read from an array of pre-configured values. This last option is of great help when you want to compare 2 runs by just changing some parameters: you are assured that the same number of customers will arrive exactly with the same distribution across your tests. The same goes for customers’ processing time (managed in TicketMachine). So when a new customer arrives, TubeStation packages some information (the customer name and its processing time) into an object which it sends to TicketMachine (through the router).
The TubeStation actor will also listen on the return message from TicketMachine actors. As we’ll see, these messages are objects also, which pack execution metrics so we can report useful information which will help us cross-compare runs.
Finally, the TubeStation will also handle the “Stop” message it receives from QueueSimul. When it receives it, it sends a so-called “PoisonPill” message to the router: this is a message which tells the destination to kill itself. The router will distribute it to the TicketMachine instances it had created. When TicketMachine actors receive the poison pill, they stop accepting new messages (if any new message were sent, it would be dropped in a “dead letter” queue). Still, they will process the messages they have accumulated in their mailbox before stopping execution. When all TicketMachine actors have stopped, the router reports to its parents that everyone is actually dead (itself included) via a Terminated message. This is when the TubeStation will initiate a shutdown of the ActorSystem.

TicketMachine. We have said a lot already about this actor, and there’s not much more to uncover here. It is a rather simple actor which just receives a new customer message from TubeStation via its implicit router parent and simulates a processing time by way of a Sleep timer. Then, it packages some metrics inside a CustomerServiced object which it sends back to its parent (the router is clever enough to know he is not the real parent, and it will forward the message back to TubeStation).

So?

From the various 50+ tests I have run using this simulation program, the series scenario (one common queue) yields better results than the parallel scenario (one queue per server).

From a pure social interaction standpoint, it is obvious that a single queue is fairer: at least, each customer will access a ticket machine in her order of arrival. But what the simulation proves is that the series scenario has the fairer waiting times (more evenly distributed), and the waiting times are on average smaller than with the parallel hypothesis.

A less intimate observation is also that ticket machines efficiency is better in the series scenario. Looking at the occupation percentages, one can see that they are better distributed across TicketMachine instances in opposition to the parallel scenario.
One last observation is that the results in favour for a series execution tend to intensify with the number of servers and the disparity of customers’ processing time.

Sample executions for runtime of 10mn, with 5 ticket machines, clients average arrival rate is 1/20s, average client request processing is 90s. First batch of result is for smallest-mailbox router (parallel scenario, one queue per server), and second one is for balancing-pool router (series scenario, one common queue for all servers).

Parallel Execution

– 42 clients served
– Ticket Machine $e terminating. Some stats: 7 clients serviced, 51877 service average duration, 363145: servicing time out of 600062 total operating time or 60.52 %
– Ticket Machine $d terminating. Some stats: 11 clients serviced, 42957 service average duration, 472527: servicing time out of 607845 total operating time or 77.74 %
– Ticket Machine $a terminating. Some stats: 12 clients serviced, 53248 service average duration, 638976: servicing time out of 661619 total operating time or 96.58 %
– Ticket Machine $b terminating. Some stats: 7 clients serviced, 88999 service average duration, 622999: servicing time out of 667121 total operating time or 93.39 %
– Ticket Machine $c terminating. Some stats: 5 clients serviced, 109916 service average duration, 549580: servicing time out of 667296 total operating time or 82.36 %
– Average wait time: 24772ms

Series Execution

– 42 clients served
– Ticket Machine $e terminating. Some stats: 7 clients serviced, 69633 service average duration, 487436: servicing time out of 614257 total operating time or 79.35 %
– Ticket Machine $d terminating. Some stats: 6 clients serviced, 80942 service average duration, 485654: servicing time out of 615536 total operating time or 78.90 %
– Ticket Machine $a terminating. Some stats: 11 clients serviced, 53747 service average duration, 591218: servicing time out of 616023 total operating time or 95.97 %
– Ticket Machine $c terminating. Some stats: 10 clients serviced, 55943 service average duration, 559436: servicing time out of 620882 total operating time or 90.10 %
– Ticket Machine $b terminating. Some stats: 8 clients serviced, 65435 service average duration, 523483: servicing time out of 677391 total operating time or 77.28 %
– Average wait time: 16916

I’ll try and keep these results in memory for my next Tube Station visit, in case someone questions again the queueing organisation. In the meantime, please have a look for yourself and play around with the simulation – you’ll see that the metrics I collect can be expanded, and further parameters to the queueing system can be exposed, so other interesting effects can be studied (e.g.: adding capacity thresholds, changing the stochastic processing time distribution method, adding priority-based ticket machines interruption, etc…)

What I will also keep in mind is that Scala and mostly the Akka Actors library makes all this work efficient, easy and enjoyable. This is the actual killer takeaway from this little experiment. I definitely recommend anyone interested in the functional take on programming paradigms to spend some quality time studying them. I know I will.

GitHub & Tools

The code of the simulation as designed and explained above is available in Github, so don’t hesitate to clone, run and otherwise play with it. Don’t hesitate to comment back on improvements you would suggest, so that everyone can benefit from it as well. Here’s the link to the repository.

As you’ll see I have just configured an SBT[8] project, typed the code in SublimeText (personal choice, any code editor will do) and managed SBT tasks from the command line. You might prefer the luxury of a graphical IDE with debugging facilities, in which case I’d definitely recommend using IntelliJ with the Scala plugin,[9] which can import your SBT project actually. Nice!

As always, thank you for stopping by.

References

– Akka’s Actors: http://doc.akka.io/docs/akka/2.3.4/scala/actors.html
– Queueing Theory: http://en.wikipedia.org/wiki/Queueing_theory#Single_queueing_nodes
– Kendal’s Notation: http://en.wikipedia.org/wiki/Kendall%27s_notation
– Poisson process: http://en.wikipedia.org/wiki/Poisson_process

Notes

[1] His name has become a measurement unit in telephony, and is also well-known by programmers in the telecommunication industry: a programming language introduced at Ericsson in 1986 has been named after him.
[2] There are 6 positions in this notation, like so: A/S/c/K/N/D:
– A: time between arrivals in the queue
– S: size of jobs
– c: number of servers
– K: capacity of the queue / number of clients allowed (infinity is assumed)
– D: queue discipline (FIFO is assumed. Stands for First In, First Out or First Come, First Served)
– N: size of the population of clients to process (infinity is assumed)
[3] A Poisson process is a random process counting the number of events (e.g.: new customer arrives at the tube station) and the time points at which they occur in a given interval. The inter-arrival times between events are not correlated, but the probability that an event occurs increase exponentially when the time is close the average arrival rate.
A Poisson process is a very good mathematical approximation to model pseudo-random events, like the arrival of clients at a supermarket till – or at a tube station.
[4] B. W. Conolly: “The autostrada queueing problem” published in the Journal of Applied Probability #21 394-403 (1984)
[5] J.P.C Blanc: “A note on waiting times in systems with queues in parallel” published in the Journal of Applied Probability #24 540-546 (1987)
[6] http://akka.io/
[7] http://www.scala-lang.org/
[8] http://www.scala-sbt.org/
[9] http://www.jetbrains.com/idea/features/scala.html


[Updated: 2015-Feb-04]

Typos and format adjustments.

 

Leave a Reply

Your email address will not be published. Required fields are marked *