Great Minds Think Alike – Cisco and VMware Agree On Sharing vs. Limiting

Sept. 13th, 2010

While reading a very informative blog post by Aaron Delp regarding VMware’s new NetIOC capability, I realized that Cisco and VMware are on the same page when it comes to server network traffic control. VMware’s NetIOC best practices plainly state exactly what Cisco has been advocating for so long – the user should plan for network contention but should not needlessly limit bandwidth in the absence of contention. 

VMware’s NetIOC Best Practices states the following:  

  • Page 5: [Using shares instead of limits] means that unused capacity will be redistributed to other contending flows and won’t go to waste.
  • Page 24: “Best practice 1: When using bandwidth allocation, use “shares” instead of “limits,” as the former has greater flexibility for unused capacity redistribution. Partitioning the available network bandwidth among different types of network traffic flows using limits has shortcomings. For instance, allocating 2Gbps bandwidth by using a limit for the virtual machine resource pool provides a maximum of 2Gbps bandwidth for all the virtual machine traffic even if the team is not saturated. In other words, limits impose hard limits on the amount of the bandwidth usage by a traffic flow even when there is network bandwidth available.”

Brad Hedlund wrote an excellent blog post discussing how Cisco uses QoS (shares) vs. HP Virtual Connect only using rate limiting (limits) and he provides some great animated graphics and analogies. I encourage you to read it…and then come back. J  

Two questions: 1. What are shares vs. limits? 2. How does Cisco UCS + VMware compare to HP Virtual Connect + VMware?
  

Shares (minimums) vs. Limits (maximums)

  • A “share” is a VMware term that represents a relative value (relative to share values assigned to other flows) used to determine a particular traffic flow’s minimum bandwidth percentage during times of congestion. The greater the ratio of shares a traffic flow is assigned, the greater the amount of bandwidth it will receive during times of congestion. For example, let’s assume Flow A is assigned 10 shares, Flow B assigned 5 shares, and Flow C assigned 25 shares. The sum of shares is 40 (10+5+25). Flow A receives a minimum bandwidth share of 25% (10/40), Flow B receives a minimum of 12.5% (5/40), and Flow C receives a minimum of 62.5% (25/40). 25% + 12.5% + 62.5% = 100% of dvUplink bandwidth (whatever speed the link is operating at). During times of no congestion, each flow would have the ability to consume up to 100% of the link bandwidth. During times of congestion, each flow would be guaranteed their minimum ‘share’ of the bandwidth percentage.
  • A “limit” is a VMware term that represents a static value used to define, in absolute units of Mbps, the maximum bandwidth a particular flow is able to consume on the overall vDS. In other words, you define a set bandwidth in Mbps and the flow is not able to exceed that maximum setting even if there is no contention on the dvUplinks. The bandwidth can be available but the flow is not eligible to use it.

To spin Brad’s highway analogy a slightly different way, I think of these two approaches to traffic control as “HOV Lanes” (shares) vs. “metered on-ramps” (limits) on the highway:  

An HOV lane guarantees of a minimum of one lane of highway “bandwidth” for high occupancy vehicles. Can high occupancy vehicles use other lanes if those lanes aren’t being used? Absolutely. If there are lots of high occupancy vehicles and very little other traffic, are the high occupancy vehicles forced to crowd into one lane? Absolutely not. What is the maximum number of lanes high occupancy vehicles can use? No maximum. What is the minimum number of lanes they are guaranteed? At least one.  

A “metered on-ramp” guarantees a maximum number of cars entering the highway in an attempt to avoid congestion on the highway. The on-ramp light allows only so many cars onto the highway during a given time period, regardless of how busy the highway is. Do most on-ramp metering systems adjust to the actual presence of highway congestion? No. They are programmed to allow a maximum number of cars entering the highway per minute, period. Will you be frustrated sitting idle in line on the on-ramp waiting on the light while seeing that the highway is wide open? Absolutely.  

Highway Analogy: QoS and Rate Limiting

Questions You May Have:  

  • Which is more important? Metered on-ramps (limits) or HOV lanes (shares)?
    IMHO, if you can only have one, pick HOV lanes (shares). Metered on-ramps do not take into account congestion throughout the highway system. They also don’t guarantee any vehicle access to a highway lane free of congestion. HOV lanes (shares) can guarantee lanes to cars as the cars move through the highway system over great distances.
    _
  • Can metered on-ramps (limits) co-exist with HOV lanes (shares)? Absolutely!
    You may want to set a hard limit on transmit or receive (maximum) while letting different traffic classes (flows) fluctuate bandwidth usage during times of no congestion.
    _
  • Can there be more than one HOV lane?
    Absolutely! You didn’t read Brad’s post above, did you? 😉
    _
  • Can there be limits placed on on-ramps (egress traffic on NIC) and off-ramps (ingress traffic on NIC)?
    Absolutely! (But it depends on whose product you’re using. See below.)
    _

Cisco UCS + VMware vs. HP Virtual Connect + VMware Design Comparison

A proper highway design takes into account both bidirectional ramp metering and highway HOV lanes, not just on-ramp metering. The same goes for network design. Simply rate limiting what the NIC transmits only addresses part of the problem and doesn’t address the bigger picture – servers to server communication over the fabric.  

  • VMware provides HOV lanes for on ramps (shares per flow per dvUplinks) plus metered on and off ramps (vDS-wide egress rate limiting per flow).
  • Cisco UCS provides metered on-ramps (egress rate limiting per Palo logical NIC), metered off-ramps (ingress rate limiting per Palo logical NIC), HOV lanes for on-ramps and off-ramps (shares per priority on Palo/Menlo NIC ingress/egress), and HOV lanes on highways between ramps (fabric-wide shares per flow),
  • HP Virtual Connect provides metered on-ramps (egress rate limiting per Virtual Connect FlexNIC), but no metered off-ramps (no ingress rate limiting per Virtual Connect FlexNIC), no HOV lanes for on-ramps or off-ramps, and no HOV lanes between ramps.

  

In the HP Virtual Connect + VMware example, you don’t have any HOV lanes. Virtual Connect only provides metered on-ramps (FlexNIC Tx speed limit). Once frames leave the HP server NIC and enter the on-ramp towards Virtual Connect, all frames are equal…first in first out. Good luck if a backup or VMotion starves out your VM’s production traffic somewhere inside the Virtual Connect domain (red arrows in graphic below). No amount of configuration in VMware or in the external network can control how Virtual Connect prioritizes traffic during times of internal congestion. In addition, FlexNIC speed limits artificially limit traffic needlessly during times of no contention/congestion – like on-ramp metering at 2 am on a Sunday!?!  

  

In the Cisco UCS + VMware example, the user has full traffic control – rate limiting in and out for VMware dvUplinks, rate limiting ingress and egress for Cisco NICs plus traffic shaping (shares per priority) in the fabric from end to end. In addition, when a rate limit is applied to a Cisco Palo interface (IF), the rate limit speed is reflected in the OS as the speed of the individual Palo interface. For example, with one Palo card, I could have 58 Palo NICs presented to the OS and each could be running at a different speed based on its rate limit as defined in the UCS Manager Service Profile.  

  

In summary, let’s reread VMware’s NetIOC Best Practice #1. “When using bandwidth allocation, use “shares” instead of “limits,” as the former has greater flexibility for unused capacity redistribution. Partitioning the available network bandwidth among different types of network traffic flows using limits has shortcomings.”  

Like Cisco, they ‘get’ server networking. Kudos to them.  

(Special Thanks to Doron Chosnek and Brad TerEick for their input)

Sep 13th, 2010 | Posted in Cisco UCS
Tags: