The visual displays the well-known effect of queuing delay based on throughput. As a queue begins to fill up due to traffic arriving faster than it can be processed, the amount of delay a particular packet experiences traversing the queue increases. The speed at which the contents of a queue can be processed is a function of the transmission rate of the facility. This leads to the classic "delay curve," depicted in the image to the right.
Common sense suggests that it ought to be possible to maintain a traffic load equal to the transmission capacity of the facility. In practice, this turns out not to be true. The average delay any given packet is likely to experience is given by the formula 1(μ-λ) where μ is the service rate (e.g., the number of packets per second the facility can sustain) and λ is the arrival rate (the average rate at which packets are arriving to be serviced).
For example, if the service rate is 10 packets per second, and the arrival rate is 5 packets per second, the average expected delay is 1/(10-5) or .2 seconds (200 milliseconds). Even though the facility is only 50% utilized, the average delay would be unacceptable for a voice communication. With an arrival rate of 9 packets per second, the average expected delay becomes 1 second. At 9.5 packets per second, it rises to 2 seconds. At 9.9 packets per second, it rises to 10 seconds. At 10 packets per second, delay rises to infinity.
The reason why this is so can best be seen by realizing that what is being described is based on averages. Traffic in a packet network is bursty and can only be described as averages. If the average arrival rate is equal to the service rate, and there is any gap in transmission that permits the queue to empty (which is inevitable in a bursty network), the idle period can never be made up. Because the overall average arrival rate is equal to the service rate, eventually the packets will arrive and there will be a pile up in the queue. Over time, this pile up will get longer and longer and the facility will congest.
The service rate is determined by taking the transmission rate of the facility and dividing it by the average expected packet size. This is represented as μ = T/P where T = the transmission rate of the facility and P = the average packet size. This yields an interesting effect: the greater the ratio between average packet size and the total transmission rate, the greater the percentage load the facility can sustain without unacceptable delay resulting.
Consider a 10 Mbps transmission facility carrying packets that average 10,000 bits in length (1250 bytes). The service rate (μ) would be 10,000,000/10,000 or 1,000 packets per second. If the arrival rate is 500 packets per second, then the expected average delay is 1/(1000-500) or 1/500 (.002 seconds or 2 millisecond)). If the packets are reduced to 1,000 bits (125 bytes) then the service rate becomes 10,000 packets per second. If there are now 5000 packets arriving per second, the average delay is 1/(10,000-5,000) or 1/5000 (.0002 seconds or .2 milliseconds), yet the facility is still 50% loaded.
The implication of this for network designers is profound. First, it suggests that core networks be designed to magnify the ratio between packet size and transmission rate. In other words, big pipes with smaller packets are better. For packet voice services, which naturally build small packets to minimize packetizing delay, the situation is optimal. It is also important to note that a network can never be fully utilized. There always has to be a bit of extra elbow-room the design to allow for the vagaries of averages and to minimize queuing delay.
It also means that sophisticated queue management techniques will be critical to ensuring that the high-priority, low-delay traffic is kept to the low end of the delay curve and low priorty, delay-insensitive traffic is at the upper end of the delay curve. This is a critical component of providing quality of service (QoS) to network traffic. Examples of these queue managment strategies include weighted round robin, weighted fair queuing, round robin queuing, first in first out, strict priority queuing, deficit weighted round robin queuing and low latency queuing.
|| Queuing delay|