Recovery Monkey: Musings on backups, storage, tuning and more

Choose a Topic:

Wed
23
May '07

Data Domain Update

I’m not known for retractions and I’m not posting one. I did however check out the new DD boxes and the really big ones are far more capable than the old ones.

So, the techies (hats off for enduring a half hour with me) explained to me a few things:

  1. The smallest block is 4K
  2. The highest possible performance for the biggest box is 200MB/s
  3. The biggest box can do a bit over 30TB raw
  4. They scrub the disk continuously so it’s effectively defragged (see below for caveat) - they did admit performance totally sucks over time if you don’t do it (finally vindicated!)

This is good news, since it’s obviously far bigger than the old ones.

Some issues though (based on what the techies told me):

  1. It scrubs the disk by virtue of NBU deleting the old images, then it knows what to get rid of. If your retentions are long then you will have performance problems. They suggested just dumping it all to tape and starting afresh once in a while. Which just confirms my suspicions on how the stuff truly works.
  2. Each “controller” is really a separate box. The 16 controller limit does not mean it’s a larger appliance, it’s the limit of the management software.
  3. Ergo, each controller can be a separate VTL or separate NFS mount. You cannot aggregate all your controllers in one large VTL. This sucks since if you need to do backups at 1GB/s or so, you’ll need at least 5-6 boxes, and you will have to define a separate library and drives per box. If you do NFS, you need to define 1-2 shares per box. This is a management nightmare. Make it all a single library! Copan has the same issue. I don’t know how they can do it though based on their architecture.

So, it looks to me like it may be a fit for some people, though I have no idea about the price points. If you want performance then you’ll need a ton of the boxes, and you’ll need to spend time configuring them. If 10 maxed-out boxes cost the same (or, worse, more) than a big EMC DL4400 (that can do 2.2GB/s) then it’s not an easy sell. Especially since EMC will be adding dedupe to their VTL - plus, you won’t have to define a bunch of separate libraries. Will EMC’s dedupe be similar? No idea, but if it doesn’t impact performance then it’s pretty compelling.

Thoughts? You know the drill.

D

'

Storage Virtualization - is there a point?

This has been bothering me for a while, and I think I’m not alone.

Hitachi has been making great progress with their virtualization gear, as has IBM, Falconstor before them, etc.

They claim you’ll be freed from the vendors’ shackles, achieve greater utilization of your arrays, simplify administration, cure cancer etc.

Well, here’s what I think:

  1. You will instead be shackled to the virtualization provider
  2. You won’t have a clue where your stuff is
  3. If you want to retire an array you could have problems (imagine creating a LUN composed of LUNs from 3 different arrays)
  4. You STILL have to use the management interfaces of the back-end arrays, since you still have to provision the storage. Instead of provisioning to hosts you provision to the virtualizer.

 

So, what have you gained, exactly?

D

'

Ate at Del Frisco’s steakhouse in Orlando

Superb.

Not much fanfare, steaks wet-aged 21 days.

Got the strip. So much better than Charley’s it wasn’t even funny. Great flavor, tender, perfectly cooked. 8/10. (Charley’s claim double the aging time but their stuff just wasn’t that good).

Sides were maybe too rich (the spinach could clog a Yak’s arteries). Bisque too thick and nowhere near my sublime experience in Savannah, GA. At least they gave me sherry to put in it, I can’t believe it’s not SOP in any place serving bisque. Heathens!

Dessert was just OK.

Something tells me (maybe it’s my impacted colon) that I should not eat steak again tonight.

D

'

Should EMC move to more multi-functional devices?

Here’s the deal: EMC has a lot of cool stuff. Lots of it came through acquisitions. Lots of it runs on the x86 platform, believe it or not.

At the moment one needs to buy multiple boxes from EMC to do NAS, SAN, archiving, etc.

Imagine if you got instead generic boxes (with their power relative to their cost, there could be a few models).

In each box you could run a Clariion, Centera, Celerra, a print server, WAAS (even though it’s Cisco it’s really a Linux box), something like Recoverpoint, and so on.

All the products could be custom Virtual Machine Appliances, possibly running on a modified ESX platform (so you can’t just run them anywhere). You’d get all the benefits of cool technologies such as VMotion and HA. You could easily add to it.

This doesn’t preclude the use of specialized hardware to accelerate certain functions, though in this age of quad-core CPUs even that may be unnecessary.

Think about it. EMC owns the IP for all that technology.

They don’t need to make less money - if anything, since all the platforms would be virtual, production would be greatly streamlined. They could even have a single type of box (say a quad quad with tons of RAM and expansion capability) as the hardware. You need more speed for NAS? Add an extra box, an extra license and load-balance a new virtual data mover.

This of course is unattainable at the moment - I don’t think VMware can provide such low latency and high throughput but maybe I’m wrong.

Such a move won’t fix the proliferation of management interfaces, but EMC could build a common interface.

Thoughts?

D

Tue
22
May '07

Netbackup best practices for ridiculously busy environments (but not exclusively).

While waiting for another EMC World session to start (this one is at “Guru” level, let’s see) I thought I might share some of my experience regarding running Netbackup on very large setups - nothing like learning through pain.

Don’t get me wrong - NBU has its marketshare for a reason. However, I want to make sure I dispel everyone’s deluded romantic notions about NBU being the be-all, end-all backup tool. It can work well, but only if you truly know its idiosyncrasies.

I can’t say I was tending the busiest NBU systems but, at one point, just one of my environments was doing about 15,000 backups jobs a day. Which is way too much - we fixed that pronto…

I won’t go too deep into each point. If anyone cares then post a comment and I will expand on it.

If you have a small shop running NBU on a single server, much of this is not for you - but there may still be a nugget or two in there… However, if you don’t at least use barcodes, I will go after you. Use tar or Windows backup, or even a rusty abacus, go to your corner and be quiet.

 

  1. Have a dedicated master server - if there are many jobs, the last thing you want is your master also being busy doing backups and vaults. It’s the half-witted brains of the operation, don’t stress it.
  2. Go way beyond the tuning recommendations in the manual - if you know what you’re doing. For instance, I have some voodoo tunings for Solaris (up to 9) that make a huge difference. Prepare for comments from Veritas (Symantec, whatever) support… “no sir it’s not like in the book sir, we can’t guarantee it will work sir…” whatever, I’ve gotten such ridiculously bad advice from their support I still cringe (and sometimes pee a little) every time I get a flashback, not to mention the endless dreams and the screaming that wake me up at night.
  3. Separate HBA ports for disk and tape. No exceptions. I don’t care what vendors say.
  4. Separate TAN (Tape Area Network), if you can swing it.
  5. Separate backup LAN. And/or Ethernet port bonding/trunking/teaming (whatever nomenclature appears in your systems). 4 gig ports per media server. 10G if you have the dough. 4 10G ports teamed and I will do the Wayne’s World “we’re not worthy” bit in front of you. Offer ends Dec 2007.
  6. Experiment with TOE cards, such as the Alacritech ones. You will get closer to full gig, though they’re expensive. Bonding is way cheaper and effective if you have many clients.
  7. Try to use port bonding that works at the switch level, too - 802.3ad is the standard, Cisco’s Etherchannel is Cisco’s. The software on the server and the setting on the switch have to jive. Half-assed intermediate approaches are just that.
  8. Don’t use weak switches at the core. I’m tired of seeing people with Cisco 4506 switches (6509 wannabe) and 8:1 oversubscribed 48-port cards. YOU WILL HAVE PROBLEMS!!!! Do your homework, find out whether or not the switch is oversubscribed, find out the total backplane throughput, figure out the blade throughput, don’t plug everything in the same port octet if you’re going to be oversubscribed - i.e. a 4-port team going to the octet that shares 1Gbit in a 4506 will not give you 4Gbits, it will give you, at best, a thoroughly blocked 150Mbits per port, tops, with problems. Did you know that if one of the 8 ports starts out before the rest and continues pumping, the rest will NOT make the first port reduce its speed but will instead trickle along at 10Mbits sometimes? Even after the initial transfer that was fast is finished and there’s nothing else going on? As Rutger Hauer said in Blade Runner, “I have… seen things you people wouldn’t believe”. Figure THAT one out when you’re having throughput problems.
  9. Use jumbo frames if you can. Bigger is better in this case. Do your homework, there are caveats.
  10. Use the right block size for your tape devices. Windows users, beware. Patches are necessary. SP1 broke block sizes over 64K on 2003 Server.
  11. Don’t go nuts with SSO! Among the myriad things Veritas doesn’t tell you unless you know the right people is that at around 250 instances of devices you will have weird device problems (25 tape drives shared among 10 media servers would make 250 instances). The safe number is closer to 150. Ignore this at your peril. If you use VTL just make more virtual drives.
  12. Use snapshots as much as possible.
  13. If you have more than a couple of media servers, consider a VTL.
  14. If you have DBAs that insist on flushing the redo logs to tape every few seconds, get a heavy-gauge jumpstart cable and a power supply that can put out, say, 20KV, a coat hanger, and wearing nothing but a stained leather apron go to work on them until they regain their senses (or not). Good times.
  15. If the DBAs can’t be persuaded even after their various body parts have been charred by high voltage, try to send the smaller backups to disk. Do NOT send frequent backups to tape. If a job is going to take less than 10min send it to disk.
  16. As a corollary to #15, only use tape for large jobs that will actually stream your tape drives.
  17. Know what your boxes can push. Most servers, even very large ones, will be hard-pressed to push 2 LTO3 drives, let alone LTO4. FYI, I’ve gotten LTO3 to go as fast as 130MB/s, sustained. Do the math. Beat the score! I cheated, BTW.
  18. Know what expansion slots to use - not all are equal, even if they look the same.
  19. Don’t push too much backup traffic over switch ISLs. Preferably don’t push any.
  20. Be super-careful with command-line manipulation of the NBU DB. Perfectly legitimate commands will not function as you might think due to silly heuristics (or lack thereof). Stay tuned, there will be a large post outing NBU in the future. The amount of dirt I have is beyond staggering. Maybe I shouldn’t have said that, I might have to look out for contract killers or Veritas people offering payola, not sure which is preferable. I’m 5 feet tall, with a goatee, skinny and blond, by the way. You can’t miss me. I also have a pronounced limp.
  21. Beware of multiplexing. Too much and restores take forever. Too little and you can’t stream your devices. Disk is your friend. Anything beyond 4-way multiplexing on tape is not.
  22. Do not send tapes offsite only once a week. You are asking for pervy uncle Murphy to pay you a visit, and he is a known repeat sex offender. He won’t discriminate, either.
  23. If you use tapes, have 2 copies of everything.
  24. Replicate to remote sites if at all possible. Tape should be a last resort.
  25. Use VMWare if at all possible. Along with #12 and #24, this helps quick recovery.
  26. Do at least 2-3 different backups of the NBU catalog. In really busy systems it’s impossible to do it after each session - there’s just no quiet time. Just have a copy on disk and 2 on tape (you can do the ones on tape inline, will create 2 at the same time, it works), then send the ones on tape to 2 different offsite locations. Have NBU email you the tape(s) barcodes it used for the catalog if you’re doing a non-standard catalog backup. Send an extra email to an externally available address. You’re not paranoid if they’re really out to get you!
  27. Can you even read from disk as fast as you can write to your backup medium? Benchmark.
  28. What’s your current network throughput if you max out all the media servers? Benchmark.
  29. Don’t use your production systems as media servers. You are inviting uncle Murphy again and he’s feeling randy.
  30. Use storage unit groups. Why on earth would you not?
  31. Cluster the master.
  32. Do NOT put media traffic through firewalls, it’s too much. ACLs on switches can work just fine.
  33. Do NOT put a dedicated media server for a subset of your boxes that are secured from the main network. If they lose access to that media server, backups fail. At any rate you’ll have to allow a few ports for the master to communicate with the media server, might as well let media server traffic through. If it seems that #32 and #33 are somewhat self-contradictory, give yourself a cigar.
  34. Simplify your life. Elaborate and numerous policies are more ways to invite uncle Murphy.

 

That’s all I have for now. Is there more? Tons, but I need to pee.

D

'

EMC World: Replication Manager and Exchange 2007

Just attended a session. Seems like the new rev of RM supports 2007 fully. They also support Recoverpoint clones (or will, later this week).

For whoever is not aware of it, EMC Replication Manager is like a front-end that manages local replicas of your salient Exchange data for the purposes of backup and restore.

Can be fiddly to set up but if you have EMC gear and Exchange, you really should look at it.

D

Mon
21
May '07

Just ate at Charley’s steakhouse in Orlando

As has been my idiom lately, I will comment on food.

Went to Charley’s steakhouse while attending EMC World.

They made a huge deal of showing off their steaks - which looked good. Wet-aged, 6 weeks for the bone-in ones, 4 weeks for the rest. Aged in-house. I prefer dry-aged but it’s hard to find outside NYC.

So I had a chunky strip, medium-rare.

Observations:

  1. Too seared on the outside, too rare on the inside (would be classified as rare in other places)
  2. Really not that tender
  3. Way too stringy
  4. Others complained theirs was too salty, mine was OK.
  5. Shoulda gone for the ribeye or porterhouse.

Escargot were OK but needed more salt and garlic.

Next time I’m getting fish, or maybe a fillet (which is too boring a cut but at least it’s hard to screw up).

D

'

At EMC World

Currently attending EMC World. The first day bored me to tears, I hope the rest will be more exciting (though it utterly depends on the presenters). Some of the material is too introductory, even if one attends the advanced sessions they’re not that advanced.

More to follow.

D

Tue
8
May '07

I wonder when dedup will make it to the arrays

Anyone feel that deduplication is not finding its final resting place in backups and WAN accelerators?

It’s only a matter of time before the algorithms are run as a matter of choice on the array processors.

Of course, that means fewer disk sales, but also bigger/faster/more expensive processors.

Replication will also become more efficient - see EMC’s recent acquisition of Kashya (now RecoverPoint - one of its functions is dedup during replication from array to array, how long do you think it will take them to move this functionality to the array processors?)

Just some random thoughts…

D

Fri
4
May '07

Another windows tuning I forgot to mention

I use my laptop so much that I sometimes forget about some server-type tunings.

I resuscitated my hot-rod AMD box - it’s a grossly overclocked monster but only has 1GB RAM (since it’s hard to find that kind of fast RAM in bigger sizes, and using 4 sticks prohibits me from overclocking it so much). Let’s just say the CPU is running a full GHz faster than stock, and with air, not water or peltier coolers.

Anyway, since it only has 1GB RAM and I use it for Photoshop and games, I can’t really use something like Supercache or Uptempo on it.

So I tried O&O Software’s Clevercache. By far not as good as the other 2 products - however, it does a decent job of automatically managing cache so you always have enough free RAM.

Then I tried the DisablePagingExecutive registry tweak - not that obscure, tons of references around.

BTW, there is a way to stop postmark from using caching - set buffering false is the command. However, I want to see the benchmark run on a system that would run normally, not measure the raw speed of my disks. Nobody cares about that anyway, especially in the big leagues (unless the config is truly moronic, of course). Cache is everything. But I digress.

So - postmark once more.

Stock:

Time:
177 seconds total
144 seconds of transactions (138 per second)

Files:
20092 created (113 per second)
Creation alone: 10000 files (333 per second)
Mixed with transactions: 10092 files (70 per second)
9935 read (68 per second)
10064 appended (69 per second)
20092 deleted (113 per second)
Deletion alone: 10184 files (3394 per second)
Mixed with transactions: 9908 files (68 per second)

Data:
548.25 megabytes read (3.10 megabytes per second)
1158.00 megabytes written (6.54 megabytes per second)

after tuning as server with the background process, large cache and fsutil as described previously:

Time:
107 seconds total
85 seconds of transactions (235 per second)

Files:
20092 created (187 per second)
Creation alone: 10000 files (526 per second)
Mixed with transactions: 10092 files (118 per second)
9935 read (116 per second)
10064 appended (118 per second)
20092 deleted (187 per second)
Deletion alone: 10184 files (3394 per second)
Mixed with transactions: 9908 files (116 per second)

Data:
548.25 megabytes read (5.12 megabytes per second)
1158.00 megabytes written (10.82 megabytes per second)

with clevercache:

Time:
97 seconds total
71 seconds of transactions (281 per second)

Files:
20092 created (207 per second)
Creation alone: 10000 files (454 per second)
Mixed with transactions: 10092 files (142 per second)
9935 read (139 per second)
10064 appended (141 per second)
20092 deleted (207 per second)
Deletion alone: 10184 files (2546 per second)
Mixed with transactions: 9908 files (139 per second)

Data:
548.25 megabytes read (5.65 megabytes per second)
1158.00 megabytes written (11.94 megabytes per second)

Hell, I guess I might get Clevercache for this system - sped it up a bit and manages memory consumption.

But look at this:

All the above plus using the DisablePagingExecutive registry tweak: BOOYA!

Time:
45 seconds total
28 seconds of transactions (714 per second)

Files:
20092 created (446 per second)
Creation alone: 10000 files (1111 per second)
Mixed with transactions: 10092 files (360 per second)
9935 read (354 per second)
10064 appended (359 per second)
20092 deleted (446 per second)
Deletion alone: 10184 files (1273 per second)
Mixed with transactions: 9908 files (353 per second)

Data:
548.25 megabytes read (12.18 megabytes per second)
1158.00 megabytes written (25.73 megabytes per second)

I guess the box is staying this way.

More info on the registry tweak:

http://technet2.microsoft.com/windowsserver/en/library/3d3b3c16-c901-46de-8485-166a819af3ad1033.mspx?mfr=true

In a nutshell, it disables the paging of kernel and driver code, so it’s always memory-resident. Makes sense in some cases, as you can see above :)

It’s so unusual that it gave me that much of a boost, though. I’d tried it a long time ago and it wasn’t quite as dramatic, but that was on a much older system.

One would argue that postmark lied but using a stopwatch and just eyeballing the sucker it was way quicker doing the transactions.

On servers I just didn’t normally set it because I figured they had enough RAM. Maybe I should start doing it on boxes that do a lot of transactional I/O. Damn, I need to try this with Supercache.

Obviously, your mileage may vary.

WARNING: DO NOT DO THIS ON ANY MACHINE THAT NEEDS TO SUSPEND!!!

Which is why I just didn’t do it on the laptop.

D

Wed
2
May '07

Cisco WAAS benchmarks, and WAN optimizers in general

Lately I’ve been dealing with WAN accelerators a lot, with the emphasis on Cisco’s WAAS (some other, smaller players are Riverbed, Juniper, Bluecoat, Tacit/Packeteer and Silverpeak). The premise is simple and compelling: Instead of having all those servers at your edge locations, move your users’ data to the core and make accessing the data feel almost as fast as having it locally, by deploying appliances that act as proxies. At the same time, you will actually decrease the WAN utilization, enabling you to use cheaper pipes, or at least not have to upgrade, where in the past you were planning to anyway.

There are significant other benefits (massive MAPI acceleration, HTTP, ftp, and indeed any TCP-based application will be optimized). Many Microsoft protocols are especially chatty, and the WAN accelerators pretty much remove the chattiness, optimize the TCP connection (automatically resizing Send/Receive windows based on latency, for instance), LZ-compress the data, and to top it all will not transfer data blocks that have already been transferred.

At this point I need to point out that there is a lot of similarity with deduplication technologies - for example, Cisco’s DRE (Data Redundancy Elimination) is, at heart, a dedup algorithm not unlike Avamar’s or Data Domain’s. So, if a Powerpoint file has gone through the DRE cache already, and someone modifies the file and sends it over the WAN again, only the modified parts will really go through. It really works and it’s really fast (and I’m about the most jaded technophile you’re likely to meet).

The reason I’m not opposed to this use of dedup (see previous posts) is that the datasets are kept at a reasonable size. For instance, at the edge you’re typically talking about under 200GB of cache, not several TB. Doing the hash calculations is not as time-consuming with a smaller dataset and, indeed, it’s set up so that the hashes are kept in-memory. You see, the whole point of this appliance is to reduce latency, not increase it with unnecessary calculations. Compare this to the multi-TB deals of the “proper” dedup solutions used for backups…

Indeed, why the hell would you need dedup-based backup solutions if you deploy a WAN accelerator? Chances are there won’t be anything at the edge sites to back up, so the whole argument behind dedup-based backups for remote sites sort of evaporates. Dedup now only makes sense in VTLs, just so you can store a bit more.

On Dedup VTLs: Refreshingly, Quantum doesn’t quote crazy compression ratios - I’ve seen figures of about 9:1 as an average, which is still pretty good (and totally dependent on what kind of data you have). I just cringe when I see the 100:1, 1000:1 or whatever insanity Data Domain typically states. I’m still worried about the effect on restore times, but I digress. See previous posts.

Anyway, back to WAN accelerators. So how do these boxes work? All fairly similarly. Cisco’s, for instance, does 3 main kinds of optimizations: TFO, DRE and LZ. TFO means TCP Flow Optimizations, and takes care of snd/rcv window scaling, enables large initial windows, enables SACK and BIC TCP (the latter 2 help with packet loss).

DRE is the dedup part of the equation, as mentioned before.

LZ is simply LZ compression of data, in addition to everything else mentioned above.

Other vendors may call their features something else, but at the end there aren’t too many ways to do this. It all boils down to:

  1. Who has the best implementation speed-wise

  2. Who is the best administration-wise

  3. Who is the most stable in an enterprise setting

  4. What company has the highest chance of staying alive (like it or not, Cisco destroys the other players here)

  5. What company is committed to the product the most

  6. As a corollary to #5, what company does the most R&D for the product

Since Cisco is, by far, the largest company of any that provide WAN accelerators (indeed, they probably spend more on light bulbs per year than the net worth of the other companies provided), in my opinion they’re the obvious force to be reckoned with, not someone like Riverbed (as cool as Riverbed is, they’re too small, and will either fizzle out or get bought - though Cisco didn’t buy them, which is food for thought. If Riverbed is so great, why would Cisco simply not acquire them?)

Case in point: When Cisco bought Actona (which is the progenitor of the current WAAS product) they only really had the Windows file-caching part shipping (WAFS). It was great for CIFS but not much else. Back then, they were actually lagging compared to the other players when it came to complete application acceleration. Fast forward a mere few months: They now accelerate anything going over TCP, their WAFS portion is still there but it’s even better and more transparent, the product works with WCCP and inline cards (making deployment at the low-end easy) and is now significantly faster than the competitors. Helps to have deep pockets.

For an enterprise, here are the main benefits of going with Cisco the way I see them:

  1. Your switches and routers are probably already Cisco so you have a relationship.

  2. WAAS interfaces seamlessly with the other Cisco gear.

  3. The best way to interface a WAN accelerator is WCCP. And it was actually developed by Cisco.

  4. The Cisco appliances are tunnel-less and totally transparent (I met someone that had Riverbed everywhere - a software glitch rendered ALL WAN traffic inoperable, instead of having it go through unaccelerated which is the way it is supposed to work. He’s now looking at Cisco).

  5. WAAS appliances don’t mess with QoS you may have already set.

  6. The WAAS boxes are actually faster in almost anything compared to the competition.

And now for the inevitable benchmarks:

Depending on the latency, you can get more or less of a speed-up. For a comprehensive test see this: http://www.cisco.com/application/pdf/en/us/guest/products/ps6870/c1031/cdccont_0900aecd8054f827.pdf

Another, longer rev: http://www.cisco.com/web/CA/channels/pdf/Miercom-on-Cisco-WAAS-Riverbed-Juniper-competitive.pdf

Yes, this is on Cisco’s website but it’s kinda hard to find any performance statistics on the other players’ sites showing Cisco’s WAAS (any references to WAFS are for an obsolete product). At least this one compares truly recent codebases of Cisco, Riverbed and Juniper. For me, the most telling numbers were the ones showing how much traffic the server at the datacenter actually sees. Cisco was almost 100x better than the competition - where the other products passed several Mbits through to the server, Cisco only needed to pass 50Kbits or so.

It is kinda weird that the other vendors don’t have any public-facing benchmarks like this, don’t you think?

However, since I tend to not completely believe vendor-sponsored benchmark numbers as much as I may like the vendor in question, I ran my own.

I used NISTnet (a free WAN simulator, http://www-x.antd.nist.gov/nistnet/) to emulate latency and throughput indicative of standard telco links (i.e. a T1). The fact that the simulator is freely available and can be used by anyone is compelling since it allows testing without disrupting production networks (for the record, I also tested on a few production networks with similar results, though the latency was lower than with the simulator).

The first test scenario is that of the typical T1 connection (approx. 1.5Mbits/s or 170KB/s at best) and 40ms of round-trip delay. I tested with zero packet loss, which is not totally realistic but it makes the benchmarks even more compelling. Usually there is a little packet loss, which makes transfer speeds even worse. This is one of the most common connections to remote sites one will encounter in production environments.

The second scenario is that of a bigger pipe (3Mbit) but much higher latency (300ms), emulating a long-distance link such as a remote site in Asia over which developers do their work. I injected a 0.2% packet loss (a small number, given the distance).

It is important to note that, in the interests of simplicity and expediency, these tests are not comprehensive. A comprehensive WAAS test consists of:

  • Performance without WAAS but with latency

  • Performance with WAAS but data not already in cache (cold cache hits). Such a test shows the real-time efficiency of the TFO, DRE and LZ algorithms.

  • Performance with the data already in the cache (hot cache hits).

  • Performance with pre-positioning of fileserver data. This would be the fastest a WAAS solution would perform, almost like a local fileserver.

  • Performance without WAAS and without latency (local server). This would be the absolute fastest performance in general.

The one cold cache test I performed involved downloading a large ISO file (400MB) using HTTP over the simulated T1 link. The performance ranged from 1.5-1.8MB/s (a full 10 times faster than without WAAS) for a cold cache hit. After the file was transferred (and was therefore in cache) the performance went to 2.5MB/s. The amazing performance might have been due to a highly compressible ISO image but, nevertheless, is quite impressive. The ISO was a full Windows 2000 install CD with SP4 slipstreamed - a realistic test with realistic data, since one might conceivably want to distribute such CD images over a WAN. Frankly this went through so quickly that I keep thinking I did something wrong.

T1 results
ftp without WAAS:
ftp: 3367936 bytes received in 19.53Seconds 168.40Kbytes/sec

Very normal T1 behavior with the simulator (for a good-quality T1).

ftp with WAAS:
ftp: 3367936 bytes received in 1.34Seconds 2505.90Kbytes/sec (15x improvement ).

Sending data was even faster:
ftp: 3367936 bytes sent in 0.36Seconds 9381.44Kbytes/sec.

waasT1

 

High Latency/High Bandwidth results

The high latency (300ms) link, even though it had double the theoretical throughput of the T1 link, suffers significantly:

ftp without WAAS
ftp: 3367936 bytes received in 125.73Seconds 26.79Kbytes/sec.

I was surprised at how much the high latency hurt the ftp transfers. I ran the test several times with similar results.

ftp with WAAS
ftp: 3367936 bytes received in 2.16Seconds 1562.12Kbytes/sec. (58x improvement ).

waaslat

 

I have more results with office-type apps but they will make for too big of a blog entry, not that this isn’t big. In any case, the thing works as advertised. I need to build a test Exchange server so I can see how much stuff like attachments are accelerated. Watch this space. Oh, and there’s another set of results at http://www.gotitsolutions.org/2007/05/18/cisco-waas-performance-benchmarks.html

Comments? Complaints? You know what to do.

D