The "Mini-Rack" Approach To Blade Server Design

imageMay 3rd, 2010

I’ve got two questions for you…

Question #1: If you were designing a datacenter full of rack servers, would you deploy two Ethernet switches, two Fibre Channel switches, and two rack managers for every 16 rack servers?  Uh, h$&# no!

Question #2: If you wouldn’t do it for rack servers, why do the legacy blade server vendors want you to deploy that design when you buy their blade servers?
_
I can tell you why – it’s a result of the “mini rack” approach to blade server design. Allow me to explain…

The Mini-Rack Mindset

I currently work for Cisco, but I spent over a decade at HP/Compaq in their x86 Server Engineering Business Unit.  I can remember the first discussions of blade servers at Compaq in the early 2000s…the discussions that led to the development of their first blade chassis (the Compaq (now HP) e-Class blade enclosure).  My involvement back then was related to the mini-switches that we would eventually build to go inside the e-Class blade enclosure.  The e-Class had two mini layer 2 Ethernet switches used for connecting the 20 blade servers to the external network.  After the e-Class release came the p-Class architecture with its two mini-Ethernet switches and two mini-Fibre Channel modules for its 8/16 blade servers.  HP then re-architected the chassis design by adding lots more I/O bays and released the current blade chassis design, the c-Class. c-Class comes with up to 8 mini-Ethernet and mini-Fibre Channel switches and two mini-rack managers, called Onboard Administrator (OA), for 16 blade servers.

So the progression was:

♦ e-Class had 20 servers, 1 switch, and 1 enclosure manager
♦ p-Class had 8 servers, 4 I/O modules, and no enclosure manager
♦ c-Class has 16,  8 I/O modules, and 2 enclosure managers

Are you seeing the pattern yet?  Slowly, the infrastructure overhead – number of required modules and number of management interfaces/IP address – to deploy 16 blade servers has grown over the last three generations of HP blade servers. HP c-Class’s best case scenario has a whopping 6:16 ratio – 4 I/O modules and 2 OA modules for 16 blade servers. The worst case ratio is 10:16 – 8 I/O and 2 OA for 16 blade servers.

The mini-rack mindset originates from the early days of blade server chassis design.  Let me walk you through the thought process that results in this mindset:

<follow along in graphic labeled “How should I architect a blade server”>
“I need to architect “blade servers” for my customer. <phase 1> I guess I should start by looking at a legacy rack full of servers as a point of reference.  What components do I see?  I see that typical racks have two Ethernet switches, two Fibre Channel switches, and multiple PDU’s.  In addition, separate central management servers (e.g. HP SIM) must be configured, clustered, and maintained in the datacenter to manage the other rack servers, and shared PDUs. Well, <phase 2> I guess I’ll make miniature versions of each of these components for every 16 servers and <phase 3> I’ll put these components inside the sheet metal (blade chassis) with the blade servers themselves.  This arrangement worked for racks for years so it must be the best design.  <phase 4> Unfortunately, this creates lots of management overhead and complexity so I guess I’ll write extra management software to solve the problem my design created.”

The end result of this line of thinking is the “mini-rack” mindset is; lots of mini-racks in your data center and all the management overhead and complexity associated with a bunch of mini-racks.

Every time you add another 16 blade servers with Enet and FC connectivity, you need to add the overhead of a minimum of SIX infrastructure modules – 2 mini Enet switches, 2 mini FC switches, and 2 Onboard Administrators. I’ll admit… as a member of the blade server engineering community, I helped perpetuate this mindset.  But we didn’t really see an alternative.  We were stuck between the choice of too many cables (passthru) and too many mini-switches.

Side Note: Does HP’s Virtual Connect product really solve the problem of too many switches? I don’t want to rat hole in this post so I’ll save that for another blog post in the near future.  Short answer is: unfortunately, no it doesn’t.

So what’s your alternative to the mini-rack architecture?
Cisco Unified Computing System (UCS)

Cisco didn’t approach blade server architecture design as a “me too mini-rack”.  Cisco has never been a “me too – let’s run off the cliff together – company”.  Cisco has built its reputation as a company that delivers innovative products and solutions that set the example in the industry.  Cisco has always been an industry leader, not a follower.  So, how did Cisco “lead” with their approach to blade server architecture?  Cisco said “We don’t want a mini-rack.  We want a logical, expandable blade chassis that provides all the key benefits of blade servers (reduced power & cooling, reduced foot print, reduced cabling) but with the infrastructure design simplicity that’s BETTER than that of the original rack server architecture.  When the logical blade chassis needs to be expanded to accommodate more blade servers, there WILL NOT be an increase in management overhead. There will just be an increase in available server hardware for the same management interface to manage.”

Cisco’s blade server architecture is designed around a pair of clustered ‘top of rack’ blade chassis managers that includes all the management functionality for blade chassis hardware, blade server hardware, switching hardware, Ethernet and Fibre Channel connectivity, server identity management, and control plane integration with VMware vCenter.  Cisco calls these devices “Cisco UCS 6100 Fabric Interconnects”.  Since all this functionality is delivered in a device that sits outside of the physical blade chassis, Cisco’s blade chassis has been simplified to provide the original intended benefits of blades – reduced power & cooling, reduced cabling, reduces server foot print – but without the management overhead nightmare forced onto the customer by the legacy mini-rack mindset.

Example of Cisco's Single Logical Blade Chassis with 80 blade servers under one UCS Manager

For example, Cisco’s Fabric Interconnects provides the functionality of HP’s Onboard Administrator, HP Virtual Connect Ethernet, HP Virtual Connect Fibre Channel, HP Virtual Connect Manager, HP Virtual Connect Enterprise Manager, and many aspects of HP SIM, all in a single, clustered-for-redundancy device that can be used for multiple blade chassis.  So, you configure these Fabric Interconnects (and three IP addresses) once.  Then you just plug in blade chassis as you need additional blade servers.  It’s a modular, expandable blade chassis design.  That’s why I call it a “single logical blade chassis”.  You can expand it anytime you want without adding more interfaces to manage. You plug in the chassis, the chassis is auto-discovered and the blade servers show up in the management interface. It really can’t get any easier.

Side Note: So how does Cisco UCS reduce cables without putting little switches in every blade chassis?  Cisco’s distributed switch architecture allows the fabric interconnects to extend their ports inside of each blade chassis. As a result, adding more blade chassis does not add more switches – it only adds more logical ports on the fabric interconnects for each server. Again, I don’t want to rat hole so I’ll save this conversation for another blog post.

As an example of the mini-rack architecture vs. Cisco UCS blade server architecture, let’s look at it just from an IP address overhead perspective. Here’s an 80 blade example for both HP and Cisco. The examples show the minimum number of infrastructure devices (redundant) and their required management IP addresses:

HP Bladesystem: 5 HP BladeSystem Enclosures with 2 x VC Enet, 2 x VC FC, and 2 x Onboard Administrators (80 server blades)

♦ 10 x IPs for Onboard Administrators
♦ 10 x IPs for Virtual Connect Ethernet (Flex-10) modules
♦ 5 x IPs for Virtual Connect Manager cluster address (optional but typical)
♦ 10 x IPs for Virtual Connect Fibre Channel modules

Total: 35 Management IP addresses for 80 HP server blades

Cisco UCS: 10 Cisco UCS Chassis with 2 x Fabric Interconnects (80 server blades)

♦ 2 x IPs for Fabric Interconnects
♦ 1 x IP for Fabric Interconnect cluster address

Total: 3 Management IP addresses for 80 Cisco server blades

HP’s 35 Management IP addresses vs. Cisco’s 3 Management IP addresses demonstrates the fundamental architectural differences in management philosophy between the two approaches.

Summary:

Cisco’s outside-the-box engineering has resulted in a brand new blade architecture. Cisco has just developed the first “automobile” class blade server architecture and the legacy “horse-n-buggy” blade server vendors are scrambling. My expectation is that the legacy blade vendors will, eventually, follow Cisco’s lead and come out with new blade architectures that get rid of all the management complexity. In the meantime, they will try to hide the complexity using more layers of management software.

May 3rd, 2010 | Posted in Cisco UCS
Tags: ,
  • Pingback: Tweets that mention The “Mini-Rack” Approach To Blade Server Design | M. Sean McGee -- Topsy.com

  • http://blog.aarondelp.com Aaron Delp

    Hey Sean! I do have a question about something I'm unclear on. Don't you all need an ip address for KVM access to the BMC on each UCS blade like you would an iLO2 ip on the HP? Not trying to pick a fight, just trying to make sure I understand everything correctly. I know you can give the ip management in UCS a “dummy” pool but in order for you to access everything remotely doesn't it need to be in the same L2 ip domain as the 3 managment ip's for the 6100's?

    Thanks!

  • http://twitter.com/mseanmcgee M. Sean McGee

    Hey Aaron! Good to hear from you again. Hope things are going well.

    You are absolutely correct – Cisco Integrated Management Controller (CIMC) addresses are needed just like iLO for HP. In this post, I simply focused on infrastructure required for every 16 server – enet & FC switches and rack managers. Whether you are talking rack servers, legacy blade servers, or Cisco blade servers, OS management (Service Console) and remote KVM require individual IP addresses.

    I hope that helped. Thanks again for raising the question so I could clarify.

    Best regards,
    -sean

  • http://blog.aarondelp.com Aaron Delp

    No problem at all! You had me worried for a second there… I'm doing my first nice solo install for a customer this week and I was thinking I might have missed something! :)

  • Carter A.

    Great post! Spot on. Children often hide under the covers to make the monster go away. As you suggest, hiding under the “management software” covers won’t make the mini-rack monsters go away. Just ask any other partner that has recently tried to deploy HP’s Matrix for a customer. Ugh.

  • Pingback: Virtualization Short Take #40 - blog.scottlowe.org - The weblog of an IT pro specializing in virtualization, storage, and servers

  • http://pl.atyp.us Jeff Darcy

    FYI: it's not accurate to say that the e-class was the first blade chassis. RLX was already selling blade systems before that, and the e-class was a direct response to an existing product (as was IBM's blade offering). The fact that HP later bought RLX doesn't mean their home-grown product was first.

  • http://twitter.com/mseanmcgee M. Sean McGee

    Hi Jeff,
    First, thanks for stopping by to read the blog and post a comment.
    Second, you are absolutely correct. e-Class was a response to RLX. I specifically remember some of the engineers on the team at Compaq leaving to go to RLX. That's what woke Compaq up to the concept of blade servers.

    Basically, my description of “first” above was in the context of Compaq/HP's first blade server, not the industry's first.

    Again, thanks for the feedback.

    Best regards,
    -sean

  • http://profiles.yahoo.com/u/LUNNKMP6WW3K2MJ4ZYJUHTQU2Y athikities supabiola

    If you feel abnormal at your foots,don’t ignore it.You have to learn for amall fiber neuropathy at http://www.small-fiber-neuropathy.com/

  • http://profiles.yahoo.com/u/LUNNKMP6WW3K2MJ4ZYJUHTQU2Y athikities supabiola

    If you feel abnormal at your foots,don’t ignore it.You have to learn for amall fiber neuropathy at http://www.small-fiber-neuropathy.com/

  • http://profiles.yahoo.com/u/LUNNKMP6WW3K2MJ4ZYJUHTQU2Y athikities supabiola
  • Pingback: The Cisco UCS B230 – the Goldilocks Blade Server | M. Sean McGee

  • Pingback: Think Meta » Links and Whatnot, Take #1

  • Pingback: UCS uitbreiden met een extra Chassis, onze ervaring « CiscoNL – Technology

  • Pingback: UCS uitbreiden met een extra chassis | TJ's Dutch Insights

  • Pingback: Server/Desktop Virtualization–A Best of Breed Band-Aid — Define The Cloud

  • Pingback: UCS 2.0: New Innovation » TJs Thoughts

  • Pingback: Be gentle, its my first time: Setting up a UCS Blade System from scratch | Vallard's Tech Notes

  • http://www.aspensystemsdirect.com/blade_server.asp Xeon Blade Server

    It is a great approach to explain on the blade server. It helps admin to understand the differences IP addresses for blade servers.

  • Anonymous

    Too bad allegiances and politics in favor of one vendor, specifically one beginning with H and ending in P, trumps design and performance superiority.  System Engineers these days tend to look down upon the Network Engineers, so how the heck can some network vendor come along and do computing?  Mind you, they see no issue with HP dominating in networking especially with 3Com in the mix (which is a bit of a joke to me).  This reminds me of how scared our SA’s are from doing iSCSI, so they’ll go to their grave with FC. 

    Great post.  Cleared up the fundamental differences.  Looking to find posts on perf comparisons next.

    • Inder

      very informative post for a network engineer, however my concern is this. If we are adding more chassis to the same TOR, won’t there be over subscription ? if yes we would need TOR for all the chassis

  • Mark Pullen

    Hi Sean,  I was pointed to your mini rack concept I see it is using the old HP Flex-10.  Understandable as this was May 2010 almost 2 years ago.  The comparison would be different based on 2012 specs.  HP Matrix is a sledgehammer to crack a nut.

  • Pingback: Introduction to the New Cisco UCS 6296UP Fabric Interconnect and 2204XP I/O Module | M. Sean McGee

  • Pingback: A Quick Primer on Cisco Fabric Extension (FEX) | M. Sean McGee

  • Pingback: UCS Networking: Simplicity of Rack Connectivity PLUS All The Benefits of Blade and Virtualization | M. Sean McGee