Hoov's Musings  (volume 1, number 3)

 

Class Of Marketing
Mark Hoover, President, Acuitive, Inc.

I am going to scream the next time I hear about a multi-gigabit multi-layer switch with amazingly differentiating Class of Service (CoS) features. I am not interested hearing about how one vendor's CoS is better than another vendor's CoS because I don't believe CoS in today's fast switches matters very much. It may as well be called "Class of Marketing" since that's the only place where the differentiation battle is being waged.

So why all the hoohah about Class Of Service?

First, let's define what it is. Basically, CoS is a way of providing preferential treatment to some types of traffic under conditions of temporary congestion of output links (in this case, Gigabit Ethernet and Fast Ethernet links). Occasionally, frames that arrive simultaneously on different input ports all need to be sent to the same output port. The first line of defense against this situation is buffering. Bursts of frames, which are intended for the same output port, are queued and sent out at a rate consistent with the output port speed.

The simplest mechanism to manage the flow of traffic under a condition of congestion is first-come-first-serve. As frames arrive they are sequentially buffered and pulled off the queue for transmission in the same exact order. You can only buffer so much -- if all the buffers are full, packets arriving later are dropped.

CoS defines more clever ways to manage the flow of traffic under conditions of congestion. CoS implies the implementation of two types of policies:

1. A policy for mapping incoming traffic into different Traffic Classes. Different vendors provide different flexibility in this regard, and this is an area where much of the marketing battle is being fought. The number of traffic classes supported can range from two to almost infinite (e.g. the MMC Networks-enabled per-flow queuing).

2. A policy for discriminating how traffic associated with different Traffic Classes are managed. Such discrimination is always associated with managing a number of data path queues, one per Traffic Class. For instance, let's say I have four different queues. I might choose a "drainage" policy that says I will always send something on queue A unless it is empty, in which case I'll go to queue B, then C, then D. This is called "strict priority." Another approach would be to give each Traffic Class a percentage of access to the output bandwidth, such as giving Class A 40 % of the bandwidth, Class B 30%, Class C 20% and Class D 10%. This approach is called weighted fair queuing" and helps to prevent any particular Traffic Class from being completely starved.

The main thing to remember about CoS is that it only kicks in under periods of congestion. If the output port can handle the load of each packet sent to it, then each packet is sent in a first-come-first-serve manner, independent of the CoS policies in effect.

When CoS does kick in, it influences latency and packet loss effects. Let's say I don't have CoS. My important acknowledgement packet to a transaction completed by a customer might be behind 20 maximum-size file transfer packets associated with an e-mail transmission that arrived at the switch a split second before my packet. Without CoS, I have to wait until the 20 packets have been "drained" before my packet is passed along. With CoS, if my packet is mapped to a higher priority Traffic Class, I can go to the head-of-the-line and be transmitted before probably 19 of the 20 packets that arrived before me.

What does this gain me? Reduced latency, reduced latency variation, and reduced probability of dropped packets. The amount of reduced latency is related to the speed of the output link. In the above example, we can calculate the reduction in latency gained via CoS as a function of output port speed :

 

Output Link Speed

 Latency per switch due to queuing w/o CoS

 Latency per switch due to queuing w/ CoS
 56 Kbps  4.3 seconds  200 milliseconds
 T1 (1.544 Mbps)  156 milliseconds  8 milliseconds
 Ethernet  24 milliseconds  1.3 milliseconds
 Fast Ethernet  2.4 milliseconds  130 microseconds
 Gigabit Ethernet  240 microseconds  13 microseconds

For interactive applications, users usually start to become aware of delays over about 200 milliseconds. Since the latency without CoS per switch is about 2.4 milliseconds with Fast Ethernet links, I would have to be going through almost 100 Fast Ethernet Switches, arriving behind 20 maximum-sized Ethernet frames at each, before I would even start to perceive a slight increase in latency or benefit from the CoS features of my switches.

This example focused on an interactive user application, but I could have provided an analysis of file transfer application throughput vs. latency or an analysis of a real-time application where latency variation is more important than absolute latency and arrived at exactly the same conclusion:

Hoov's CoS Rule #1: The higher the speed of the links, the lesser the importance of CoS.

  • In terms of reducing latency for critical applications, CoS is of minimal advantage over Fast Ethernet and essentially irrelevant over Gigabit links.
  • Some form of CoS may be important in the 10 Mbps portions of the network and potentially vital in your WAN links and in the Service Provider core WAN infrastructure (where, presumably, bandwidth is more expensive and thus less available).

CoS can help to reduce the probability of dropping packets for critical applications, but I like to approach this situation in a different way:

Hoov's CoS Rule #2: Congestion avoidance is almost always better than congestion behavior management.

  • Where bandwidth is cheap, in the LAN and Campus, the best congestion avoidance technique is the targeted use of excess bandwidth.
    • Instrument your network so that you know where packets are being dropped.
    • Implement wire speed L2 switches where needed to reduce congestion in workgroups.
    • Implement wire speed L3 switches where needed to reduce cross-subnet traffic congestion.
    • Monitor port frame dropping statistics and add parallel trunks (Link Aggregation for Fast Ethernet or Gigabit Ethernet trunks) where needed to ensure adequate bandwidth between switches.
    • Select high-speed switch vendors not on CoS features, but on capacity planning instrumentation and applications which allow you to follow the above guidelines in a simple, straightforward manner.
    • With today's technologies and today's price points, shame on you if you have congestion on Campus/LAN trunks a significant amount of time.
  • Where bandwidth is expensive, invest in TCP Rate Shaping tools.

TCP Rate Shaping (TRS) is the art of reducing congestion by controlling the rate at which end systems communicate. TRS is a form of Quality-of-Service (QoS), but better. Why better? To explain this, let me digress into a discussion on QoS.

QoS functions, initially introduced to the mass market related to ATM, are oriented towards providing a guaranteed level of service across a network for selected applications or users. Such guarantees are implemented by setting aside resources on every switch and every trunk between the sender and the receiver. Resources can be CPU cycles, buffers (queues), memory bandwidth, trunk bandwidth, or other technical factors which influence the rate at which a cell passes through a switch. Such guarantees could be configured into every switch for every application flow, but the administrative cost of doing that would be enormous. So to enable usable QoS, signaling schemes have been invented that allow end-station applications to "negotiate" with the network for a specific level of service (usually bandwidth). In the IP world, RSVP (Resource Reservation Protocol) has been created to emulate similar capabilities without the need for ATM.

In both cases (ATM and RSVP), very little use is being made of QoS in networks today and it appears to me that the implementation of QoS using these techniques will remain stalled.

Why is that?

For a fundamental reason. For ATM or RSVP to work, you need the technology to be implemented in every switch or router in the end-to-end path as well as the end systems. In networking, the technologies that get widely implemented are those that users can implement a little of and get some value, and then implement more over time. All-or-nothing never seems to take off.

So we're left with hundreds of articles and trade show sessions about QoS, millions of Engineer hours being spent figuring out how to implement QoS, and thousands of Marketing hours being spent figuring out how to position QoS that doesn't exist.

Which brings me back to TCP Rate Shaping (TRS). TRS is something that can be bought and used today to solve some real problems.

TRS is when you interdict TCP traffic flows between two end stations and modify the handshaking between them such that the end-to-end communication occurs at a desired rate (bandwidth). This becomes useful as a tool when you have a specific point of potential congestion in a network and want to implement your own policy about how you want that bandwidth utilized. For instance, let's say you have a T1 link from your site into a Frame Relay network. You know that link often becomes congested, but it is way too expensive to think about any higher speed WAN access right now. You just want to squeeze the most you can out of that T1 link.

In TRS, you identify the target bandwidth you want each flow in each Traffic Class to have available to it under periods of congestion. The sum total of all these guarantees for the identified traffic that you care about has to be equal to or less than the link speed (T1, for instance). During periods when there is excess bandwidth available, you can set policies for who gets first access to that excess bandwidth. As you head towards congestion, the TCP flows are rate shaped by slowing down the return of Acknowledgement packets and adjusting advertised window sizes, so that the end-to-end communications slows down towards the guaranteed rate. Since the sum of all the guaranteed rates is less than the link speed, no congestion occurs, just smooth slowing and acceleration of application flows as link usage ebbs and flows.

The biggest advantages of Rate Shaping are:

  • It allows you to "Hit'em Where It Helps." You buy a box and use it to manage the WAN links that you choose, implementing the policies that you choose. No ISP, Service Provider, IETF body, ATM Forum, multi-national committee or judicial body need be involved in the decision or implementation.
  • End stations are transparent to the implementation. They are just running normal TCP processes.
  • It's bi-directional. From a single point in the network you can influence the rate of traffic flows both into and out of your site.

I believe that TRS is how QoS will really get implemented in networks because it is useful and can be implemented independently by different end users in a point-wise manner. TRS is on my list as one of the Next Big Things in networking.

As far as I know, Packeteer www.packeteer.com is the only vendor with TRS capabilities right now. There are a multitude of other vendors with various "Traffic Shaping" functions available as features on some routers, firewalls, and web server load balancers. These features generally allow you to allocate bandwidth available to specific Traffic Classes using traditional queuing and CoS techniques. The net effect is a virtual T-MUX, more flexible than allocating DS0s, but still it doesn't do anything to avoid congestion.

To determine whether a vendor supports TRS, ask them if they can adjust the rate at which clients and servers communicate with one another. If the answer is yes, and it's not just Class of Marketing, you may have found a potential source of technology to manage your WAN bandwidth, complementing a "beat'em with cheap bandwidth" strategy on the LAN.

The down side of TRS that is that the QoS influence is not global. You can manage your WAN access bandwidth extremely well, but if there is congestion within your ISP or Frame Relay network, you will still suffer performance degradation. But that's what buying Optimal Networks www.optimal.com, Visual Networks www.visualnetworks.com or Vital Signs www.vitalsigns.com software to measure Service Provider performance is all about. You get to yell at your Provider about their service problems using documented data. Don't invest in a lot of technology that doesn't really work yet anyway (e.g. RSVP) to try to solve other people's problems.

(volume 1, number 3)

 

Home

Clients

Services

Hoov's Musings

Research Reports

About Acuitive

Send email to info@acuitive.com with questions or comments about this web site.
Copyright ©1997-2001 Acuitive, Inc. All Rights Reserved