CryptoSpiel.com
No Result
View All Result
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams
No Result
View All Result
CryptoSpiel.com
No Result
View All Result

Perplexity AI Leverages NVIDIA Inference Stack to Handle 435 Million Monthly Queries

December 6, 2024
in Blockchain
Reading Time: 2 mins read
A A
0
Nvidia Plans to add Innovation in the Metaverse with Software, Marketplace Deals
0
SHARES
9
VIEWS
ShareShareShareShareShare


Terrill Dicki
Dec 06, 2024 04:17

Perplexity AI utilizes NVIDIA’s inference stack, including H100 Tensor Core GPUs and Triton Inference Server, to manage over 435 million search queries monthly, optimizing performance and reducing costs.





Perplexity AI, a leading AI-powered search engine, is successfully managing over 435 million search queries each month, thanks to NVIDIA’s advanced inference stack. The platform has integrated NVIDIA H100 Tensor Core GPUs, Triton Inference Server, and TensorRT-LLM to efficiently deploy large language models (LLMs), according to NVIDIA’s official blog.

Serving Multiple AI Models

To meet diverse user demands, Perplexity AI operates over 20 AI models simultaneously, including variations of the open-source Llama 3.1 models. Each user request is matched with the most suitable model using smaller classifier models that determine user intent. These models are deployed across GPU pods, each managed by an NVIDIA Triton Inference Server, ensuring efficiency under strict service-level agreements (SLAs).

The pods are hosted within a Kubernetes cluster, featuring an in-house front-end scheduler that directs traffic based on load and usage. This ensures consistent SLA adherence, optimizing performance and resource utilization.

Optimizing Performance and Costs

Perplexity AI employs a comprehensive A/B testing strategy to define SLAs for varied use cases. This process aims to maximize GPU utilization while maintaining target SLAs, optimizing inference serving costs. Smaller models focus on minimizing latency, while larger, user-facing models like Llama 8B, 70B, and 405B undergo detailed performance analysis to balance costs and user experience.

Performance is further enhanced by parallelizing model deployment across multiple GPUs, increasing tensor parallelism to achieve lower serving costs for latency-sensitive requests. This strategic approach has enabled Perplexity to save approximately $1 million annually by hosting models on cloud-based NVIDIA GPUs, surpassing third-party LLM API service costs.

Innovative Techniques for Enhanced Throughput

Perplexity AI is collaborating with NVIDIA to implement ‘disaggregating serving,’ a method that separates inference phases onto different GPUs, significantly boosting throughput while adhering to SLAs. This flexibility allows Perplexity to utilize various NVIDIA GPU products to optimize performance and cost-efficiency.

Further improvements are anticipated with the upcoming NVIDIA Blackwell platform, promising substantial performance gains through technological innovations, including a second-generation Transformer Engine and advanced NVLink capabilities.

Perplexity’s strategic use of NVIDIA’s inference stack underscores the potential for AI-powered platforms to manage vast query volumes efficiently, delivering high-quality user experiences while maintaining cost-effectiveness.

Image source: Shutterstock


Credit: Source link

RELATED POSTS

Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

Riot Platforms Sells $289M in Bitcoin as Mining Output Drops 4% in Q1

Exploring Chainlink’s Role Beyond Price Feeds in the Blockchain Ecosystem

Buy JNews
ADVERTISEMENT
ShareTweetSendPinShare
Previous Post

Top 6 Cryptos for Big Gains: December 2024’s Best Picks to Transform Your Investments

Next Post

NVIDIA Expands AI Ventures with New Research Center in Vietnam

Related Posts

Bitcoin Addresses Holding Between 100 and 10,000 BTC Hit a 7-Week High
Blockchain

Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

April 10, 2026
Riot Blockchain Yearly Bitcoin Production Increases by 236%, Accumulates $194M in BTC
Blockchain

Riot Platforms Sells $289M in Bitcoin as Mining Output Drops 4% in Q1

April 2, 2026
Galaxy Digital: Ethereum Developers Discuss Key Upgrades During Latest Consensus Call
Blockchain

Exploring Chainlink’s Role Beyond Price Feeds in the Blockchain Ecosystem

December 9, 2025
Next Post
Nvidia Plans to add Innovation in the Metaverse with Software, Marketplace Deals

NVIDIA Expands AI Ventures with New Research Center in Vietnam

FBI Cracks $8.3M Crypto Fraud: Launderer Sentenced to Prison and $2.3M Repayment

FBI Cracks $8.3M Crypto Fraud: Launderer Sentenced to Prison and $2.3M Repayment

Recommended Stories

Ripple CEO Says CLARITY Act Talks Near Breakthrough as Senate Standoff Eases

Ripple CEO Says CLARITY Act Talks Near Breakthrough as Senate Standoff Eases

April 14, 2026
SEC Opens Proceedings on NYSE Proposal to List Grayscale Crypto ETF Options – Regulation Bitcoin News

SEC Opens Proceedings on NYSE Proposal to List Grayscale Crypto ETF Options – Regulation Bitcoin News

April 11, 2026
Bitcoin Addresses Holding Between 100 and 10,000 BTC Hit a 7-Week High

Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

April 10, 2026

Popular Stories

  • Winklevoss Twins Continue Crypto Donation Spree With Another $1,000,000 in Bitcoin (BTC)

    Trader Says DeFi Altcoin Aave Witnessing Clear Trend Switch, Updates Forecast on Two Low-Cap Coins

    0 shares
    Share 0 Tweet 0
  • Andreessen Horowitz Leads $160M Investment in Harvey at $8B Valuation

    0 shares
    Share 0 Tweet 0
  • DeFi giant Aave plans to launch its own overcollateralized stablecoin GHO

    0 shares
    Share 0 Tweet 0
  • Stabble Urges Users to Pull Liquidity After Alleged North Korean Hacker Link

    0 shares
    Share 0 Tweet 0
  • Ava Labs Introduces Intellectual Property Management to K-Pop

    0 shares
    Share 0 Tweet 0
CryptoSpiel.com

This is an online news portal that aims to provide the latest crypto news, blockchain, regulations and much more stuff like that around the world. Feel free to get in touch with us!

What’s New Here!

  • Ripple CEO Says CLARITY Act Talks Near Breakthrough as Senate Standoff Eases
  • SEC Opens Proceedings on NYSE Proposal to List Grayscale Crypto ETF Options – Regulation Bitcoin News
  • Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

Subscribe Now

Loading
  • Live Crypto Prices
  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

© 2021 - cryptospiel.com - All rights reserved!

No Result
View All Result
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams

© 2021 - cryptospiel.com - All rights reserved!

Please enter CoinGecko Free Api Key to get this plugin works.