NVIDIA GH200 Superchip Revolutionizes Apache Spark with Unprecedented Efficiency

Terrill Dicki
Aug 21, 2024 08:49

NVIDIA’s GH200 Superchip enhances Apache Spark performance with 35x faster query responses and up to 22x fewer nodes, significantly reducing energy consumption.

As the growth of generative AI continues to surge, IT leaders are seeking ways to optimize data center resources. According to the NVIDIA Technical Blog, the newly introduced NVIDIA GH200 Grace Hopper Superchip offers a groundbreaking solution for Apache Spark users, promising substantial improvements in energy efficiency and node consolidation.

Tackling Legacy Bottlenecks in CPU-Based Apache Spark Systems

Apache Spark, a multi-language open-source system, has been instrumental in handling massive volumes of data across various industries. Despite its advantages, traditional CPU-based systems encounter significant limitations, leading to inefficiencies in data processing workflows.

Pioneering a New Era of Converged CPU-GPU Superchips

NVIDIA’s GH200 Superchip addresses these limitations by integrating the Arm-based Grace CPU with the Hopper GPU architecture, connected via NVLink-C2C technology. This integration offers up to 900 GB/s bandwidth, significantly outpacing the standard PCIe Gen5 lanes found in traditional systems.

The GH200’s architecture enables seamless memory sharing between CPU and GPU, eliminating the need for data transfers and thus accelerating Apache Spark workloads by up to 35x. For large clusters of over 1,500 nodes, this translates to a reduction of up to 22x in the number of nodes and annual energy savings of up to 14 GWh.

NVIDIA GH200 Sets New Highs in NDS Performance Benchmarks

Performance benchmarks using the NVIDIA Decision Support (NDS) benchmark revealed that running Apache Spark on GH200 is significantly faster compared to premium x86 CPUs. Specifically, executing 100+ TPC-DS SQL queries on a 10 TB dataset took only 6 minutes with GH200, versus 42 minutes on x86 CPUs.

Notable query accelerations include:

Query67: 36x speedup
Query14: 10x speedup
Query87: 9x speedup
Query59: 9x speedup
Query38: 8x speedup

Reducing Power Consumption and Cutting Energy Costs

The GH200’s efficiency becomes even more apparent with larger datasets. For a 100 TB dataset, GH200 required only 40 minutes on a 16-node cluster, compared to the need for 344 CPUs to achieve the same results with traditional setups. This represents a 22x reduction in nodes and 12x in energy savings.

Exceptional SQL Acceleration and Price Performance

HEAVY.AI benchmarked GH200 against an 8x NVIDIA A100 PCIe-based instance, reporting a 5x speedup and 16x cost savings for a 100 TB dataset. On a larger 200 TB dataset, GH200 still outperformed with a 2x speedup and 6x cost savings.

“Our customers make data-driven, time-sensitive decisions that have a high impact on their business,” said Todd Mostak, CTO and co-founder of HEAVY.AI. “We’re excited about the new business insights and cost savings that GH200 will unlock for our customers.”

Get Started with Your GH200 Apache Spark Migration

Enterprises can leverage the RAPIDS Accelerator for Apache Spark to migrate workloads seamlessly to the GH200. This transition promises significant operational efficiencies, with GH200 already powering nine supercomputers globally and available through various cloud providers. For more details, visit the NVIDIA Technical Blog.

Image source: Shutterstock

Credit: Source link