Hoov's Musings (volume 5, number 10)  

Don’t Stub Your TOE
Mark Hoover, President, Acuitive, Inc.

One area of relatively active investment and development energy in the last 18-24 months has been the TCP Offload Engine (TOE).  TOEs are devices that offload central CPUs from some or all of the work of TCP processing.   The expectation is that, by doing so, the performance of the system will improve considerably because cycles are returned to the central CPU for performing other functions that are kind of important – like running applications.

A handful of companies have been funded to build TOEs, and even more have been funded to build devices with TOE functionality in it, along with assist for other higher-level functions.  The interest in TOEs has been high because in theory all kinds of devices could use TCP assist – web servers, application servers, database servers, NAS servers, file servers, iSCSI endpoints and gateways, SAN gateways, SAN IP virtualization devices, proxy caches, proxy firewalls, web acceleration devices (such as SSL or XML accelerators), content switches, high speed clients, network test equipment, and so on.  Basically any device that terminates TCP is a candidate – and what doesn’t terminate TCP?  Having said that, since the value proposition is performance related, most of the focus has been on devices that either terminate lots and lots of TCP connections and/or terminate big fat TCP connections (there is a company or two focused on the opposite side of the spectrum – very low power, low footprint TCP/IP solutions for small devices such as cell phones and toasters, but it is a different enough market that I choose not to address it here).

My belief is a couple of reasonably successful companies will result from the investments in this space, but the market isn’t big enough for everyone to share, therefore there will be lots of drop-outs.  So, who will win out?

The TOE market will be fundamentally capped by the fact there are at least a couple of TOE-avoidance strategies viable in the majority of cases. The main competition to a TOE is the same-old run TCP on your host processor.  In theory you get about one bps of unidirectional TCP processing out of every one Hz of CPU power.  In practice, it’s about half that (but increasing).  So, at 100 Mbps and below today, TCP doesn’t take up a very large part of a one GHz processor, and there is no real demand for TOEs at those speeds.  At higher speeds, like one Gbps, there can be some advantage today because the slice of CPU required to perform TCP may be too high.  But as Mr. Moore continually works to increase the performance of your host CPU (3 GHz processors shipping now!!) and, (probably more importantly) software engineers work to squeeze every bit of inefficiency out of their protocol stacks, the line at which the cost of TCP offload is justified by the performance advantage keeps moving up.  Even today, there is debate at one Gbps.

Even if there is an advantage of off-loading the TCP function from the host CPU, one option is to execute that via just another CPU or multi-CPU chip, like a Broadcom (SiByte) chip with optimized code.  For Broadcom, the SiByte products are its TOE strategy.  Over time, they’ll just add more CPU cores and some modest hardware assist for checksum generation and verification, direct DMA of packets to/from the MAC to DRAM, and arbitrary aligned DMA chains to allow the formation of packets on the fly without any copy so data can fly directly from your application space onto the network.  You can’t do a whole lot better than that.  As these solutions become more optimized for network I/O and TCP off-load applications, the advantage narrows for purpose-build TOEs.  Today, this TOE-avoidance choice seems to be common for designs in the 100s of Mbps to one or two Gbps, even though most of the I/O optimization capabilities mentioned above haven’t rolled out yet.

So the vendor of a TOE-specific product has to achieve one of two things:

  1. Performance faster than the CPU-oriented alternatives can achieve, which today means multi-Gbps through 10 Gbps and in the future will mean 40 Gbps.
     

  2. A significantly better cost and power solution compared to SiByte, taking into account all of the peripheral circuitry (if any) required to implement both SiByte and the TOE solution.

This creates a problem for the TOE vendors, because semiconductor vendors generally need unit volume to be profitable.  I believe the TOE market is not large enough to be heavily subdivided, and therefore the winner needs to achieve both of the above.  The winning vendor will be the one who creates an architectural advantage, and leverages it for the highest possible performance in one product and the lowest possible cost at a slower speed in another product. The winner has to have a 10 Gbps solution and an architecture that can scale both up and down.  Lower speed socket wins will represent a transient, yet still very important market as processor-based solutions move up the ladder.

The biggest volume opportunities for high speed TOEs may be related to iSCSI (which if you’ve read my previous two Musings, you’ll know I’m dubious about) and in high-density blade servers using 10 Gbps Ethernet internally to interconnect blades (which I am not dubious about).  But the latter application requires a lot of chips to co-reside in a small enclosure, which means low power.  So the solution needs to be fast and power efficient.  That’s quite a challenge.

To make matters even more inconvenient for the TOE vendors, it turns out that a TOE is not a TOE is not a TOE.  Accelerating TCP usually means not only focusing on TCP as a protocol in a manner consistent with the application need, but also tuning the solution to fit the systemic needs of the particular application.  Web servers require different specific TCP optimizations than multimedia servers or NAS servers or database servers or content switches or proxy firewalls.  In addition, all of these applications differ in terms of how the TOE needs to integrate with other devices such as encryption co-processors, classification engines, and traffic managers.  Even more importantly, different applications vary in how the TCP function needs to optimally interface with Upper Layer Functions (HTTP/SSL for web servers, NFS/CIFS and/or file systems for NAS servers, RDMA for clustered application servers, iSCSI and FCP for SAN gateways, proprietary switch fabrics and operating environment for content switches, etc.).   So a TOE vendor has to either segment the market and go after specific niches, or figure out how to build a flexible generalized solution that can be easily configured or programmed to meet the optimizations required across a broad range of applications.  Like I said before, the market doesn’t look very interesting if divided up into too many segments.  So now we’re up to fast, power efficient and flexible.

In terms of flexibility and stepping up to being a solution rather than a component, the so-called Storage Network Processor (SNP) vendors may have gotten it a little more right than the so-called TOE vendors.  SNPs all implement TCP acceleration, but as a means to an end rather than the end itself.  In addition to TCP, they are implementing iSCSI processing, iFCP, FCIP, NFS, NAS, virtualization functions, RAID 0,1, mirroring, caching, and in some cases, actual file systems, within their devices. Since these vendors push the external interface to a higher level, they can execute all kinds of optimizations between the functions they implement internally.  These devices are usually highly programmable and each SNP vendor has a different focus and priority order for the implementation of the firmware to realize these capabilities. But in a roadmap sense, they all end up at about the same place.  They are biting off a lot, but if they can get it right, they represent a much larger advantage to an OEM than just a TOE. However, they trade off being even more specialized and application-specific than TOEs, and in some cases are expending a lot of R&D energy on protocols and applications for which the market hasn’t developed yet.  It’s high risk, high reward.

To be fair, most TOE vendors are on a path to do higher-level functions and many SNPs can be cost effective if you just use them as a TOE.  So the lines of demarcation are very blurred.

Again, who will win?  I think the safest bet for now is on a TOE-oriented supplier with a high speed, power efficient architecture, where both the TCP implementation and interface to Upper Layer Functions is very flexible, allowing OEMs to explore lots of optimizations across lots of product categories.   Expanded support for Upper Layer Functions can follow quickly in the future for those optimizations that prove the most useful and the most prevalent (in other words, see where the market happens and follow it).

I have my own view of who is positioned to best meet these criteria, but I’ll keep them to myself for now as most of the vendors are still in development. I don’t want to champion PowerPoint slides, I want to promote good silicon.

(volume 5, number 10)

Home

Clients

Services

Hoov's Musings

Research Reports

About Acuitive


Send email to info@acuitive.com with questions or comments about this web site.
Copyright ©1997-2002 Acuitive, Inc.  All Rights Reserved