Our Services Who We Are Contact Us

Storage Strategies NOW


"Deni Connor's extensive background spans multiple IT disciplines and enables her to provide actionable feedback and thorough analyses which benefits QLogic, our global channel partners and our customers. I highly recommend SSG-NOW as a cost-effective IT consultancy that delivers high impact results."

Steve Zivanic,
Senior Director, QLogic

"Deni is fearless. As a reporter she never failed to objectively cover the prickliest of issues. As an analyst, we expect her to be one of the best."

John Dix, editor in chief of Network World

"SSG-NOW provides deep market knowledge coupled with a strategic, yet pragmatic approach. Deni's technology understanding is well matched with the skepticism of a former reporter - she drills down to relevant questions and quickly assesses the viability of a strategy or technology from an end-user's perspective."

Ursula Talley, vice president of marketing, StoredIQ

"Storage Strategies NOW analysts are experts in data center and storage/server issues, always providing deep technology experience and objective, practical and timely analysis of products, technologies and industry direction."

Jay Kramer, vice president of worldwide marketing, SEPATON


8815 Mountain Path Circle

Austin, TX 78759

512.345.3850

info@ssg-now.com



NextIO’s vCORE: Consolidates and optimizes GPU operations

A boon for scientific and technical computing

By Deni Connor
Principal Analyst, Storage Strategies NOW
May 2010

Introduction

Originally used for 3D gaming acceleration, GPUs have now come to the forefront of use for scientific computing, financial services’ Monte Carlo simulations, oil and gas exploration and pattern recognition, among other applications – all embarrassing parallel workloads involving very large datasets and calling for floating point operations – that can be broken down into parallel processes and thus, saving organizations time and money.

Despite all the advantages of using GPUs for technical and scientific computing, also comes a wealth of challenges, mostly centered upon GPU maintenance and replacement. When workloads are broken up so that the sequential part of the application runs on the CPU and the computationally-intensive part runs on the GPU, if the GPU requires replacement, all workloads running on the server containing the GPU must be stopped, the server downed and the GPU replaced. The same thing happens when new firmware needs to be loaded or when a new version of the GPU needs to be added to the system – operations stop while labor-intensive and high touch-point GPU maintenance occurs. This in turn impacts all the applications running on the server, not just that application running on the failed GPU – an event that delays an organization’s ability to do its work.

As the number of jobs and GPUs increase, so do the dependency and potential for resource contention. Jobs that don’t require GPUs for computation may be assigned to servers containing GPUs, while workloads requiring GPUs wait for them to be available.

The result of GPU contention and maintenance factors is that system administrators tend to over-provision GPUs to ensure that one is always available when needed for a planned workload. This approach is clearly problematic – TCO increases, labor intensifies and ROI decreases.

Imagine if you could look at GPUs as a shared pool of resources – a concentrator or consolidator of sorts – that would allow you to assign GPUs to workloads and servers on demand and manage and configure GPUs centrally. Is that a pipe dream or reality?

Enter vCore

NextIO introduced the vCORE consolidation appliance recently – a shared resource appliance that congregates GPUs in a single external enclosure. The vCORE appliance consists of a 4U (7-inch) high, 20-inch deep enclosure containing either eight double-wide or 16 single-wide GPUs, which can be connected to servers, where it is able to consolidate GPUs in an industry standards-based fashion. The appliance at this time accommodates either NVIDIA Tesla or Quadro GPUs, although in the future will work with GPUs from other vendors. It attaches to any x86-based blade or rack-mounted server via the PCIe bus. Having the CPUs

Figure 1. The vCORE consolidation appliance

concentrated in a single enclosure also provides for investment protection and future-proofing – GPUs can be

dynamically added to the system as needed or replaced when failures occur without affecting workloads running on other GPUs.

The vCORE consolidation appliance has 3+1 fan cooling and a 2400W power and cooling capacity. Carrier cards are incorporated into the appliance for hot-swappable GPU replacement. Further, the appliance can be managed remotely via the included nConnect management software, which offers a GUI, command line or third-party API. The vCORE appliance also supports the I/O Virtualization specification for Fibre Channel and Gbit Ethernet, which allows multiple operating systems to run simultaneously within a single computer and natively share PCIe devices.

Finally, the vCORE appliance is available in several models, which vary by the number of GPUs, servers and network connections they support.

Figure 2. vCORE models

Impressive in operations, the appliance delivers 78 to 630GFLOPs of performance per GPU and supports 240 to 512 server cores per GPU.

vCORE serviceability, configurability and management

With the vCORE appliance also comes easy serviceability and configurability. When a GPU needs to be replaced or upgraded, applications assigned to other GPUs can continue to run. The affected GPU is taken out of service by removing it from its hot-pluggable sled, the new GPU is inserted without impact to other running applications, and the new GPU is brought online and the application is restarted.

Management of GPUs is also centralized with the vCORE appliance. GPUs can be shut down from the console, removed physically from the enclosure and when newly repositioned in the chassis, they’ll appear on the management console once again, where they can be re-enabled and workloads restarted. The versatility of the nControl management software is also apparent – administrators can interface with the vCORE appliance via a Web-based GUI, a command line interface or open APIs to third-party workload schedulers such as the Moab Cluster Suite or TORQUE Resource Manager.

Total cost of ownership analysis

In this example, the customer requirement is 309 double precision Tflops, and assumes $0.11 per KWH. In order to meet the requirement with x86 servers, a total of 4,217 high-end servers are required, along with over 100 48 port switches. This many servers uses nearly three million dollars per year in energy costs alone, not counting cooling expense. This assumes 100% duty cycle.

Configuration Total

Servers

Cisco

Switches

vCORE Capital

Expense

Tflops Energy cost
per year
Total Cost

3 years

Savings
x86 only 4,217 101 0 $13.358M 309.7 $2.873M $21.979M 0
x86 + vCORE 152 15 76 $3.600M 324.4 $0.248M $4.345M $17.633M

SSG-NOW Assessment

The NextIO vCORE brings large scale GPU computing into the enterprise. With its centralized management on consolidation of 8 or 16 GPUs, the vCORE appliance is a forerunner for eased technical and scientific computing. Its serviceability and configurability is the answer to GPU maintenance and replacement and its rack-mounted approach makes it easy to install in existing data center rack configurations. Because the vCORE appliance accommodates GPUs and servers from any vendor, it allows users to avoid vendor lock-in, increase their TCO and optimize performance as the GPU market advances.

Note: The information and recommendations made by Storage Strategies NOW are based upon public information and sources and may also include personal opinions both of Storage Strategies NOW and others, all of which we believe to be accurate and reliable. As market conditions change however, and not within our control, the information and recommendations are made without warranty of any kind. All product names used and mentioned herein are the trademarks of their respective owners. Storage Strategies NOW, Inc. assumes no responsibility or liability for any damages whatsoever (including incidental, consequential or otherwise), caused by your use of, or reliance upon, the information and recommendations presented herein, nor for any inadvertent errors which may appear in this document.

Leave a Reply