CryptoSpiel.com
No Result
View All Result
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams
No Result
View All Result
CryptoSpiel.com
No Result
View All Result

NVIDIA Unveils Mistral-NeMo-Minitron 8B Model with Superior Accuracy

August 22, 2024
in Blockchain
Reading Time: 3 mins read
A A
0
Nvidia Plans to add Innovation in the Metaverse with Software, Marketplace Deals
0
SHARES
6
VIEWS
ShareShareShareShareShare


Tony Kim
Aug 22, 2024 05:37

NVIDIA’s new Mistral-NeMo-Minitron 8B model demonstrates superior accuracy across nine benchmarks, utilizing advanced pruning and distillation techniques.





NVIDIA, in collaboration with Mistral AI, has announced the release of the Mistral-NeMo-Minitron 8B model, a highly advanced open-access large language model (LLM). According to the NVIDIA Technical Blog, this model surpasses other models of a similar size in terms of accuracy on nine popular benchmarks.

Advanced Model Pruning and Distillation

The Mistral-NeMo-Minitron 8B model was developed by width-pruning the larger Mistral NeMo 12B model, followed by a light retraining process using knowledge distillation. This methodology, originally proposed by NVIDIA in their paper on Compact Language Models via Pruning and Knowledge Distillation, has been validated through multiple successful implementations, including the NVIDIA Minitron 8B and 4B models, as well as the Llama-3.1-Minitron 4B model.

Model pruning involves reducing the size and complexity of a model by either dropping layers (depth pruning) or neurons and attention heads (width pruning). This process is often paired with retraining to recover any lost accuracy. Model distillation, on the other hand, transfers knowledge from a large, complex model (the teacher model) to a smaller, simpler model (the student model), aiming to retain much of the predictive power of the original model while being more efficient.

The combination of pruning and distillation allows for the creation of progressively smaller models from a large pretrained model. This approach significantly reduces the computational cost, as only 100-400 billion tokens are needed for retraining, compared to the much larger datasets required for training from scratch.

Mistral-NeMo-Minitron 8B Performance

The Mistral-NeMo-Minitron 8B model demonstrates leading accuracy on several benchmarks, outperforming other models in its class, including the Llama 3.1 8B and Gemma 7B models. The table below highlights the performance metrics:








  Training tokens Wino-Grande 5-shot ARC Challenge 25-shot MMLU 5-shot Hella Swag 10-shot GSM8K 5-shot TruthfulQA 0-shot XLSum en (20%) 3-shot MBPP 0-shot Human Eval 0-shot
Llama 3.1 8B 15T 77.27 57.94 65.28 81.80 48.60 45.06 30.05 42.27 24.76
Gemma 7B 6T 78 61 64 82 50 45 17 39 32
Mistral-NeMo-Minitron 8B 380B 80.35 64.42 69.51 83.03 58.45 47.56 31.94 43.77 36.22
Mistral NeMo 12B N/A 82.24 65.10 68.99 85.16 56.41 49.79 33.43 42.63 23.78

Table 1. Accuracy of the Mistral-NeMo-Minitron 8B base model compared to the teacher Mistral-NeMo 12B, Gemma 7B, and Llama-3.1 8B base models. Bold numbers represent the best among the 8B model class

Implementation and Future Work

Following the best practices of structured weight pruning and knowledge distillation, the Mistral-NeMo 12B model was width-pruned to yield the 8B target model. The process involved fine-tuning the unpruned Mistral NeMo 12B model using 127 billion tokens to correct for distribution shifts, followed by width-only pruning and distillation using 380 billion tokens.

The Mistral-NeMo-Minitron 8B model showcases superior performance and efficiency, making it a significant advancement in the field of AI. NVIDIA plans to continue refining the distillation process to produce even smaller and more accurate models. The implementation of this technique will be gradually integrated into the NVIDIA NeMo framework for generative AI.

For further details, visit the NVIDIA Technical Blog.

Image source: Shutterstock


Credit: Source link

RELATED POSTS

Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

Riot Platforms Sells $289M in Bitcoin as Mining Output Drops 4% in Q1

Exploring Chainlink’s Role Beyond Price Feeds in the Blockchain Ecosystem

Buy JNews
ADVERTISEMENT
ShareTweetSendPinShare
Previous Post

Arizona AG Warns Scammers Always Find New Ways to Steal Your Money With Crypto

Next Post

Analysts Point to Market Reset as Potential Catalyst for Bitcoin’s Next Rally

Related Posts

Bitcoin Addresses Holding Between 100 and 10,000 BTC Hit a 7-Week High
Blockchain

Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

April 10, 2026
Riot Blockchain Yearly Bitcoin Production Increases by 236%, Accumulates $194M in BTC
Blockchain

Riot Platforms Sells $289M in Bitcoin as Mining Output Drops 4% in Q1

April 2, 2026
Galaxy Digital: Ethereum Developers Discuss Key Upgrades During Latest Consensus Call
Blockchain

Exploring Chainlink’s Role Beyond Price Feeds in the Blockchain Ecosystem

December 9, 2025
Next Post
Bitcoin Holders Adamant on Their Positions as Inactive Supply Marks New Highs: Bitfinex

Analysts Point to Market Reset as Potential Catalyst for Bitcoin's Next Rally

Secure Your Future Yield with MoonBag’s 88% APY While Discovering PEPE’s Successful Ride and PlayDoge’s Surge to $6 Million

Secure Your Future Yield with MoonBag’s 88% APY While Discovering PEPE’s Successful Ride and PlayDoge’s Surge to $6 Million

Recommended Stories

Ripple CEO Says CLARITY Act Talks Near Breakthrough as Senate Standoff Eases

Ripple CEO Says CLARITY Act Talks Near Breakthrough as Senate Standoff Eases

April 14, 2026

Popular Stories

  • Winklevoss Twins Continue Crypto Donation Spree With Another $1,000,000 in Bitcoin (BTC)

    Trader Says DeFi Altcoin Aave Witnessing Clear Trend Switch, Updates Forecast on Two Low-Cap Coins

    0 shares
    Share 0 Tweet 0
  • China’s Bitcoin Hashrate Dominance Dives, ‘North America Could Steal the Lead’ – Mining Bitcoin News

    0 shares
    Share 0 Tweet 0
  • The Downturn in Bitcoin Signals Long Endurance before Reaching new ATH, Analyst Says

    0 shares
    Share 0 Tweet 0
  • Venezuelan Crypto Regulator Sunacrip Strengthens AML/KYC Requirements for Virtual Asset Service Providers – News Bitcoin News

    0 shares
    Share 0 Tweet 0
  • Hong Kong’s MemeStrategy Becomes First Listed Firm to Invest in Solana

    0 shares
    Share 0 Tweet 0
CryptoSpiel.com

This is an online news portal that aims to provide the latest crypto news, blockchain, regulations and much more stuff like that around the world. Feel free to get in touch with us!

What’s New Here!

  • Ripple CEO Says CLARITY Act Talks Near Breakthrough as Senate Standoff Eases
  • SEC Opens Proceedings on NYSE Proposal to List Grayscale Crypto ETF Options – Regulation Bitcoin News
  • Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

Subscribe Now

Loading
  • Live Crypto Prices
  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

© 2021 - cryptospiel.com - All rights reserved!

No Result
View All Result
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams

© 2021 - cryptospiel.com - All rights reserved!

Please enter CoinGecko Free Api Key to get this plugin works.