Strategy

Hot New Trend: Linking Clouds Through Cheap IP VPNs Instead of Private Lines

Todd Hoff

30 Jun 2009 — 4 min read

You might think major Internet companies have a latency, availability, and bandwidth advantage because they can afford expensive dedicated point-to-point private line networks between their data centers. And you would be right. It's a great advantage. Or it at least it was a great advantage. Cost is the great equalizer and companies are now scrambling for ways to cut costs. Many of the most recognizable Internet companies are moving to IP VPNs (Virtual Private Networks) as a much cheaper alternative to private lines. This is a strategy you can effectively use too.

This trend has historical precedent in the data center. In the same way leading edge companies moved early to virtualize their data centers, leading edge companies are now virtualizing their networks using IP VPNs to build inexpensive private networks over a shared public network. In kindergarten we learned sharing was polite, it turns out sharing can also save a lot of money in both the data center and on the network.

The line of reasoning for adopting IP VPNs goes something like this:

Major companies are saving 1/4 to 1/2 of their networking costs by moving from private lines to IP VPNs. This does not even include the benefits of lower equipment costs (GigE ports are basically free) and more flexible provisioning (any-to-any connectivity, easy bandwidth dialup).

Cheaper comes with a cost. Private lines are reliable. The Internet is inherently unreliable, especially when two endpoints are linked by potentially dozens of routers in between. In particular Internet connections suffer from: 1) dropped packets 2) out of order packets. Statistically this may happen for only 1% of packets, but when it does the user experience plummets. To get a feel for the impact imagine you have a 200ms latency link to Europe and you're trying to do something interactive. Lose a packet and you'll have to wait for a retransmission which will take at least 1 second. So IP VPNs can provide an order of magnitude more bandwidth for less money, but they often have less actual throughput and reliability.

Since latency and quality are so important to Internet companies, how can they possibly afford to use IP VPNs? They cheat. They fix the IP connection by using WAN accelerators.

WAN accelerators are typically thought to be mostly about caching, but they can also can trick TCP into giving a better connection even over unreliable networks. It's like wearing corrective lenses for your network. And that's what you need when dropping dedicated lines for Internet connections.

Relatively inexpensive WAN accelerators can turn somewhat unreliable Internet connections into a very reliable cost effective connection option. Your customers won't believe it's not butter.

The result: lots of money saved and a quality costumer experience.

We take TCP for granted so to learn it has this unsightly packet loss/delay problem is a bit unsettling. But here's the impact packet loss has on throughput:

Latency: 100ms, Loss: 1%, Throughput: 1.2 Mbps

Latency: 200ms, Loss: 1%, Throughput: .6 Mbps

Latency: 100ms, Loss: .5%, Throughput: 1.7 Mbps

These numbers are independent of your WAN link capacity. You could have an 100Mbps link with 1% loss and 100ms latency and you're limited to 1Mbps!

The reason why we have this bandwidth robbing state of affairs is because when TCP was designed packet loss meant network congestion. The way to deal with congestion is to stop sending data in order to avoid congestion. This drops throughput drastically for a very long time. Over long distance WAN connections packets can be delayed which seems like a packet loss which causes congestion avoidance measures to kick in. Or maybe only a single packet was dropped and that kicks in congestion avoidance.

The trick is convincing TCP that everything is cool so the full connection bandwidth can be used. WAN accelerators have a number of complex features to keep TCP happy. Damon Ennis, VP Product Management and Customer Support for Silver Peak, a WAN accelerator company, talks about why clouds, IP VPNs, and WAN accelerators are a perfect match:

Moving applications into the cloud offers substantial cost savings for enterprises. Unfortunately those savings come at the cost of application performance. Often performance is so hampered that users’ productivity is severely limited. In extreme cases, users refuse to utilize the cloud-based application altogether and resort to old habits like saving files locally without centralized backup or returning to their old “thick” applications.

The cloud limits performance because the applications must be accessed over the WAN. WANs are different from LANs in three ways – WAN bandwidth is a fraction of LAN bandwidth, WAN latency is orders of magnitude higher than LAN latency, and packet loss exists on the WAN where none existed on the LAN. Most IT professionals are familiar with the impacts of bandwidth on transfer times – a 100MB file takes approximately 1 second to transfer on a Gbps LAN and approximately 10 seconds to transfer on a 100Mbps LAN. They then extrapolate this thinking to the WAN and assume that it will take 10 seconds to transfer the same file on a 100Mbps WAN. Unfortunately, this isn’t the case. Introduce 100ms of latency and this transfer now takes almost 3 minutes. Introduce just 1 % packet loss and this transfer now takes over 10 minutes.

There’s a calculator available that will let you figure out the effective throughput of your own WAN if you know its bandwidth, latency, and loss. Once you know your effective throughput simply divide 800Mb (100MB) by your effective throughput to determine how long it would take to transfer the same example file over your WAN.

Latency and loss don’t just impact file transfer times, they also have a dramatic impact on any applications that need to be accessed in real-time over the WAN. In this context a real-time application is one that requires real-time response to users’ keystrokes – think of any application that is served over a thin-client infrastructure or Virtual Desktop Infrastructure (VDI). Not only is the server 100 ms away but any lost packet will result in delays of up to half a second waiting for the loss to be detected and the retransmission to occur. This is the root cause of the frustrated user banging on the enter key looking for a response.

This all seems like a lot effort, doesn't it? Why not just dump TCP and move to a better protocol? Sounds good but everything works on TCP so to change now would be monumental. And as strange as it seems TCP is doing it's job. It's a protocol so there's a separation of what's above it from what's below it which allows innovation at the TCP level without breaking anything. And that's what layering is all about.

The upshot is with a little planning you can take advantage of much cheaper IP VPN costs, improve latency, and maximize bandwidth usage. Just like the big guys.

Cloud Computing Requires Infrastructure 2.0 by Gregory Ness

Myth of Bandwidth and Application Performance by Ameet Dhillon

How Does WAN Optimization Work? by Paul Rubens

SilverPeak Technology Overview