CryptoSpiel.com
No Result
View All Result
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams
No Result
View All Result
CryptoSpiel.com
No Result
View All Result

NVIDIA Enhances TensorRT Model Optimizer v0.15 with Improved Inference Performance

August 16, 2024
in Blockchain
Reading Time: 2 mins read
A A
0
Nvidia Plans to add Innovation in the Metaverse with Software, Marketplace Deals
0
SHARES
21
VIEWS
ShareShareShareShareShare


Zach Anderson
Aug 16, 2024 03:03

NVIDIA releases TensorRT Model Optimizer v0.15, offering enhanced inference performance through new features like cache diffusion and expanded AI model support.





NVIDIA has introduced the latest v0.15 release of the NVIDIA TensorRT Model Optimizer, a cutting-edge quantization toolkit designed to enhance model optimization techniques such as quantization, sparsity, and pruning. This update aims to reduce model complexity and optimize the inference speed of generative AI models, according to NVIDIA Technical Blog.

Cache Diffusion

The new version includes support for cache diffusion, building on the previously established 8-bit post-training quantization (PTQ) technique. This feature accelerates diffusion models at inference time by reusing cached outputs from previous denoising steps. Methods like DeepCache and block caching optimize inference speed without additional training. This mechanism leverages the temporal consistency of high-level features between consecutive denoising steps, making it compatible with models like DiT and UNet.

Developers can enable cache diffusion by using a single ‘cachify’ instance in the Model Optimizer with the diffusion pipeline. For instance, enabling cache diffusion in a Stable Diffusion XL (SDXL) model on an NVIDIA H100 Tensor Core GPU delivers a 1.67x speedup in images per second. This speedup further increases when FP8 is also enabled.

Quantization-Aware Training with NVIDIA NeMo

Quantization-aware training (QAT) simulates the effects of quantization during neural network training to recover model accuracy post-quantization. This process involves computing scaling factors and incorporating simulated quantization loss into the fine-tuning process. The Model Optimizer uses custom CUDA kernels for simulated quantization, achieving lower precision model weights and activations for efficient hardware deployment.

Model Optimizer v0.15 expands QAT integration support to include NVIDIA NeMo, an enterprise-grade platform for developing custom generative AI models. This first-class support for NeMo models allows users to fine-tune models directly with the original training pipeline. For more details, see the QAT example in the NeMo GitHub repository.

QLoRA Workflow

Quantized Low-Rank Adaptation (QLoRA) is a fine-tuning technique that reduces memory usage and computational complexity during model training. It combines quantization with Low-Rank Adaptation (LoRA), making large language model (LLM) fine-tuning more accessible. Model Optimizer now supports the QLoRA workflow with NVIDIA NeMo using the NF4 data type. For a Llama 13B model on the Alpaca dataset, QLoRA can reduce peak memory usage by 29-51% while maintaining model accuracy.

Expanded Support for AI Models

The latest release also expands support for a wider suite of AI models, including Stability.ai’s Stable Diffusion 3, Google’s RecurrentGemma, Microsoft’s Phi-3, Snowflake’s Arctic 2, and Databricks’ DBRX. For more details, refer to the example scripts and support matrix available in the Model Optimizer GitHub repository.

Get Started

NVIDIA TensorRT Model Optimizer provides seamless integration with NVIDIA TensorRT-LLM and TensorRT for deployment. It is available for installation on PyPI as nvidia-modelopt. Visit the NVIDIA TensorRT Model Optimizer GitHub page for example scripts and recipes for inference optimization. Comprehensive documentation is also available.

Image source: Shutterstock


Credit: Source link

RELATED POSTS

Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

Riot Platforms Sells $289M in Bitcoin as Mining Output Drops 4% in Q1

Exploring Chainlink’s Role Beyond Price Feeds in the Blockchain Ecosystem

Buy JNews
ADVERTISEMENT
ShareTweetSendPinShare
Previous Post

StarChain Unifies Astronomical Data, Why This Crypto’s Recent Updates Make It the Top Presale to Invest In for 2024

Next Post

Aptos Ecosystem Thrives: 3570% Transaction Increase in 24 Hours

Related Posts

Bitcoin Addresses Holding Between 100 and 10,000 BTC Hit a 7-Week High
Blockchain

Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

April 10, 2026
Riot Blockchain Yearly Bitcoin Production Increases by 236%, Accumulates $194M in BTC
Blockchain

Riot Platforms Sells $289M in Bitcoin as Mining Output Drops 4% in Q1

April 2, 2026
Galaxy Digital: Ethereum Developers Discuss Key Upgrades During Latest Consensus Call
Blockchain

Exploring Chainlink’s Role Beyond Price Feeds in the Blockchain Ecosystem

December 9, 2025
Next Post
Aptos Ecosystem Thrives: 3570% Transaction Increase in 24 Hours

Aptos Ecosystem Thrives: 3570% Transaction Increase in 24 Hours

Why ZCash’s Price Topped the Crypto Charts Last Week?

Why ZCash's Price Topped the Crypto Charts Last Week?

Recommended Stories

Stabble Urges Users to Pull Liquidity After Alleged North Korean Hacker Link

Stabble Urges Users to Pull Liquidity After Alleged North Korean Hacker Link

April 8, 2026
SEC Opens Proceedings on NYSE Proposal to List Grayscale Crypto ETF Options – Regulation Bitcoin News

SEC Opens Proceedings on NYSE Proposal to List Grayscale Crypto ETF Options – Regulation Bitcoin News

April 11, 2026
Bitcoin Addresses Holding Between 100 and 10,000 BTC Hit a 7-Week High

Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

April 10, 2026

Popular Stories

  • What’s the Impact of Ordinals on the BTC Network? (Research)

    What’s the Impact of Ordinals on the BTC Network? (Research)

    0 shares
    Share 0 Tweet 0
  • Bitcoin Price Analysis: Stops Hit Above 20836

    0 shares
    Share 0 Tweet 0
  • MATIC Price Prediction: $0.80 Target by November 2025 Despite Current Bearish Momentum

    0 shares
    Share 0 Tweet 0
  • Coinbase ‘Will Not Institute a Blanket Ban’ on All Transactions Tied to Russian Crypto Addresses – Bitcoin News

    0 shares
    Share 0 Tweet 0
  • Trader Says DeFi Altcoin Aave Witnessing Clear Trend Switch, Updates Forecast on Two Low-Cap Coins

    0 shares
    Share 0 Tweet 0
CryptoSpiel.com

This is an online news portal that aims to provide the latest crypto news, blockchain, regulations and much more stuff like that around the world. Feel free to get in touch with us!

What’s New Here!

  • Ripple CEO Says CLARITY Act Talks Near Breakthrough as Senate Standoff Eases
  • SEC Opens Proceedings on NYSE Proposal to List Grayscale Crypto ETF Options – Regulation Bitcoin News
  • Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

Subscribe Now

Loading
  • Live Crypto Prices
  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

© 2021 - cryptospiel.com - All rights reserved!

No Result
View All Result
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams

© 2021 - cryptospiel.com - All rights reserved!

Please enter CoinGecko Free Api Key to get this plugin works.