The "Mini-Rack" Approach To Blade Server Design

imageMay 3rd, 2010

I’ve got two questions for you…

Question #1: If you were designing a datacenter full of rack servers, would you deploy two Ethernet switches, two Fibre Channel switches, and two rack managers for every 16 rack servers?  Uh, h$&# no!

Question #2: If you wouldn’t do it for rack servers, why do the legacy blade server vendors want you to deploy that design when you buy their blade servers?
_
I can tell you why – it’s a result of the “mini rack” approach to blade server design. Allow me to explain…

The Mini-Rack Mindset

I currently work for Cisco, but I spent over a decade at HP/Compaq in their x86 Server Engineering Business Unit.  I can remember the first discussions of blade servers at Compaq in the early 2000s…the discussions that led to the development of their first blade chassis (the Compaq (now HP) e-Class blade enclosure).  My involvement back then was related to the mini-switches that we would eventually build to go inside the e-Class blade enclosure.  The e-Class had two mini layer 2 Ethernet switches used for connecting the 20 blade servers to the external network.  After the e-Class release came the p-Class architecture with its two mini-Ethernet switches and two mini-Fibre Channel modules for its 8/16 blade servers.  HP then re-architected the chassis design by adding lots more I/O bays and released the current blade chassis design, the c-Class. c-Class comes with up to 8 mini-Ethernet and mini-Fibre Channel switches and two mini-rack managers, called Onboard Administrator (OA), for 16 blade servers.

So the progression was:

♦ e-Class had 20 servers, 1 switch, and 1 enclosure manager
♦ p-Class had 8 servers, 4 I/O modules, and no enclosure manager
♦ c-Class has 16,  8 I/O modules, and 2 enclosure managers

Are you seeing the pattern yet?  Slowly, the infrastructure overhead – number of required modules and number of management interfaces/IP address – to deploy 16 blade servers has grown over the last three generations of HP blade servers. HP c-Class’s best case scenario has a whopping 6:16 ratio – 4 I/O modules and 2 OA modules for 16 blade servers. The worst case ratio is 10:16 – 8 I/O and 2 OA for 16 blade servers.

The mini-rack mindset originates from the early days of blade server chassis design.  Let me walk you through the thought process that results in this mindset:

<follow along in graphic labeled “How should I architect a blade server”>
“I need to architect “blade servers” for my customer. <phase 1> I guess I should start by looking at a legacy rack full of servers as a point of reference.  What components do I see?  I see that typical racks have two Ethernet switches, two Fibre Channel switches, and multiple PDU’s.  In addition, separate central management servers (e.g. HP SIM) must be configured, clustered, and maintained in the datacenter to manage the other rack servers, and shared PDUs. Well, <phase 2> I guess I’ll make miniature versions of each of these components for every 16 servers and <phase 3> I’ll put these components inside the sheet metal (blade chassis) with the blade servers themselves.  This arrangement worked for racks for years so it must be the best design.  <phase 4> Unfortunately, this creates lots of management overhead and complexity so I guess I’ll write extra management software to solve the problem my design created.”

The end result of this line of thinking is the “mini-rack” mindset is; lots of mini-racks in your data center and all the management overhead and complexity associated with a bunch of mini-racks.

Every time you add another 16 blade servers with Enet and FC connectivity, you need to add the overhead of a minimum of SIX infrastructure modules – 2 mini Enet switches, 2 mini FC switches, and 2 Onboard Administrators. I’ll admit… as a member of the blade server engineering community, I helped perpetuate this mindset.  But we didn’t really see an alternative.  We were stuck between the choice of too many cables (passthru) and too many mini-switches.

Side Note: Does HP’s Virtual Connect product really solve the problem of too many switches? I don’t want to rat hole in this post so I’ll save that for another blog post in the near future.  Short answer is: unfortunately, no it doesn’t.

So what’s your alternative to the mini-rack architecture?
Cisco Unified Computing System (UCS)

Cisco didn’t approach blade server architecture design as a “me too mini-rack”.  Cisco has never been a “me too – let’s run off the cliff together – company”.  Cisco has built its reputation as a company that delivers innovative products and solutions that set the example in the industry.  Cisco has always been an industry leader, not a follower.  So, how did Cisco “lead” with their approach to blade server architecture?  Cisco said “We don’t want a mini-rack.  We want a logical, expandable blade chassis that provides all the key benefits of blade servers (reduced power & cooling, reduced foot print, reduced cabling) but with the infrastructure design simplicity that’s BETTER than that of the original rack server architecture.  When the logical blade chassis needs to be expanded to accommodate more blade servers, there WILL NOT be an increase in management overhead. There will just be an increase in available server hardware for the same management interface to manage.”

Cisco’s blade server architecture is designed around a pair of clustered ‘top of rack’ blade chassis managers that includes all the management functionality for blade chassis hardware, blade server hardware, switching hardware, Ethernet and Fibre Channel connectivity, server identity management, and control plane integration with VMware vCenter.  Cisco calls these devices “Cisco UCS 6100 Fabric Interconnects”.  Since all this functionality is delivered in a device that sits outside of the physical blade chassis, Cisco’s blade chassis has been simplified to provide the original intended benefits of blades – reduced power & cooling, reduced cabling, reduces server foot print – but without the management overhead nightmare forced onto the customer by the legacy mini-rack mindset.

Example of Cisco's Single Logical Blade Chassis with 80 blade servers under one UCS Manager

For example, Cisco’s Fabric Interconnects provides the functionality of HP’s Onboard Administrator, HP Virtual Connect Ethernet, HP Virtual Connect Fibre Channel, HP Virtual Connect Manager, HP Virtual Connect Enterprise Manager, and many aspects of HP SIM, all in a single, clustered-for-redundancy device that can be used for multiple blade chassis.  So, you configure these Fabric Interconnects (and three IP addresses) once.  Then you just plug in blade chassis as you need additional blade servers.  It’s a modular, expandable blade chassis design.  That’s why I call it a “single logical blade chassis”.  You can expand it anytime you want without adding more interfaces to manage. You plug in the chassis, the chassis is auto-discovered and the blade servers show up in the management interface. It really can’t get any easier.

Side Note: So how does Cisco UCS reduce cables without putting little switches in every blade chassis?  Cisco’s distributed switch architecture allows the fabric interconnects to extend their ports inside of each blade chassis. As a result, adding more blade chassis does not add more switches – it only adds more logical ports on the fabric interconnects for each server. Again, I don’t want to rat hole so I’ll save this conversation for another blog post.

As an example of the mini-rack architecture vs. Cisco UCS blade server architecture, let’s look at it just from an IP address overhead perspective. Here’s an 80 blade example for both HP and Cisco. The examples show the minimum number of infrastructure devices (redundant) and their required management IP addresses:

HP Bladesystem: 5 HP BladeSystem Enclosures with 2 x VC Enet, 2 x VC FC, and 2 x Onboard Administrators (80 server blades)

♦ 10 x IPs for Onboard Administrators
♦ 10 x IPs for Virtual Connect Ethernet (Flex-10) modules
♦ 5 x IPs for Virtual Connect Manager cluster address (optional but typical)
♦ 10 x IPs for Virtual Connect Fibre Channel modules

Total: 35 Management IP addresses for 80 HP server blades

Cisco UCS: 10 Cisco UCS Chassis with 2 x Fabric Interconnects (80 server blades)

♦ 2 x IPs for Fabric Interconnects
♦ 1 x IP for Fabric Interconnect cluster address

Total: 3 Management IP addresses for 80 Cisco server blades

HP’s 35 Management IP addresses vs. Cisco’s 3 Management IP addresses demonstrates the fundamental architectural differences in management philosophy between the two approaches.

Summary:

Cisco’s outside-the-box engineering has resulted in a brand new blade architecture. Cisco has just developed the first “automobile” class blade server architecture and the legacy “horse-n-buggy” blade server vendors are scrambling. My expectation is that the legacy blade vendors will, eventually, follow Cisco’s lead and come out with new blade architectures that get rid of all the management complexity. In the meantime, they will try to hide the complexity using more layers of management software.

May 3rd, 2010 | Posted in Cisco UCS
Tags: ,