CryptoSpiel.com
No Result
View All Result
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams
No Result
View All Result
CryptoSpiel.com
No Result
View All Result

NVIDIA Launches Granary Dataset to Enhance Multilingual Speech AI

August 15, 2025
in Blockchain
Reading Time: 2 mins read
A A
0
Nvidia Plans to add Innovation in the Metaverse with Software, Marketplace Deals
0
SHARES
4
VIEWS
ShareShareShareShareShare


Jessie A Ellis
Aug 15, 2025 09:01

NVIDIA introduces the Granary dataset and models designed to improve speech recognition and translation across 25 European languages, addressing data scarcity in AI language models.





NVIDIA has unveiled a new open dataset and models aimed at advancing multilingual speech AI, addressing the limited language support in existing AI language models. The Granary dataset, alongside the NVIDIA Canary and Parakeet models, seeks to enhance speech recognition and translation capabilities for 25 European languages, including underrepresented ones such as Croatian, Estonian, and Maltese, according to NVIDIA’s blog.

Granary Dataset: A New Resource for AI Developers

The Granary dataset is a comprehensive collection of multilingual speech datasets, encompassing approximately a million hours of audio. This includes nearly 650,000 hours dedicated to speech recognition and over 350,000 hours for speech translation. The dataset is accessible on Hugging Face, providing a valuable resource for developers to scale AI applications globally, facilitating the creation of multilingual chatbots, customer service voice agents, and real-time translation services.

Developed in collaboration with Carnegie Mellon University and Fondazione Bruno Kessler, the Granary dataset utilizes NVIDIA’s NeMo Speech Data Processor toolkit to transform unlabeled audio into structured, high-quality data. This innovative processing pipeline allows for enhanced public speech data without the need for extensive human annotation, making it a critical resource for AI training in the European Union’s official languages, plus Russian and Ukrainian.

Introducing NVIDIA Canary and Parakeet Models

The NVIDIA Canary-1b-v2 and Parakeet-tdt-0.6b-v3 models, trained on the Granary dataset, offer powerful tools for transcription and translation. Canary-1b-v2, a billion-parameter model, supports high-quality transcription of European languages and translation between English and 24 other languages. Meanwhile, Parakeet-tdt-0.6b-v3, with 600 million parameters, is optimized for real-time or large-volume transcription tasks.

Both models are designed to provide accurate punctuation, capitalization, and word-level timestamps in their outputs. Canary-1b-v2 is particularly notable for its efficiency, offering transcription and translation quality comparable to models three times its size, while running inference up to ten times faster.

Advancing Speech AI Innovation

By sharing the methodology behind Granary and its associated models, NVIDIA is empowering the global speech AI developer community to adapt similar data processing workflows to other automatic speech recognition (ASR) or automatic speech translation (AST) models, thereby accelerating innovation in the field. The models and dataset are publicly available under a permissive license, encouraging widespread use and adaptation.

The Granary dataset and NVIDIA’s new models represent a significant step forward in addressing the challenges of data scarcity in speech AI, particularly for languages that have been historically underrepresented in AI language models. This initiative not only broadens the scope of multilingual speech recognition and translation but also enhances the inclusivity and effectiveness of AI technologies globally.

The Granary dataset and models are available for exploration on Hugging Face, and further details can be accessed on NVIDIA’s blog.

Image source: Shutterstock


Credit: Source link

RELATED POSTS

Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

Riot Platforms Sells $289M in Bitcoin as Mining Output Drops 4% in Q1

Exploring Chainlink’s Role Beyond Price Feeds in the Blockchain Ecosystem

Buy JNews
ADVERTISEMENT
ShareTweetSendPinShare
Previous Post

Brazil Unveils US Tariff Countermeasures as Trade Conflict Worsens

Next Post

Hong Kong to Hold Tender for 3-Year RMB Government Bonds Amid Infrastructure Push

Related Posts

Bitcoin Addresses Holding Between 100 and 10,000 BTC Hit a 7-Week High
Blockchain

Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

April 10, 2026
Riot Blockchain Yearly Bitcoin Production Increases by 236%, Accumulates $194M in BTC
Blockchain

Riot Platforms Sells $289M in Bitcoin as Mining Output Drops 4% in Q1

April 2, 2026
Galaxy Digital: Ethereum Developers Discuss Key Upgrades During Latest Consensus Call
Blockchain

Exploring Chainlink’s Role Beyond Price Feeds in the Blockchain Ecosystem

December 9, 2025
Next Post
Hong Kong Monetary Authority Warns About Phishing Messages Targeting China Construction Bank (Asia)

Hong Kong to Hold Tender for 3-Year RMB Government Bonds Amid Infrastructure Push

Hong Kong Monetary Authority Warns About Phishing Messages Targeting China Construction Bank (Asia)

Hong Kong Monetary Authority Reports Decline in Credit Card Receivables for Q2 2025

Recommended Stories

SEC fight over tokenized stocks could decide whether Wall Street keeps control

SEC fight over tokenized stocks could decide whether Wall Street keeps control

April 7, 2026
Treasury Proposes Stablecoin AML Rules as Bessent Vows to Protect US Financial System – Crypto News Bitcoin News

Treasury Proposes Stablecoin AML Rules as Bessent Vows to Protect US Financial System – Crypto News Bitcoin News

April 8, 2026
Ripple CEO Says CLARITY Act Talks Near Breakthrough as Senate Standoff Eases

Ripple CEO Says CLARITY Act Talks Near Breakthrough as Senate Standoff Eases

April 14, 2026

Popular Stories

  • Renowned 3D NFT Artist Gal Yosef Announces Meta Eagle Club Collection Backed By Eden Gallery

    Renowned 3D NFT Artist Gal Yosef Announces Meta Eagle Club Collection Backed By Eden Gallery

    0 shares
    Share 0 Tweet 0
  • Trader Says DeFi Altcoin Aave Witnessing Clear Trend Switch, Updates Forecast on Two Low-Cap Coins

    0 shares
    Share 0 Tweet 0
  • Crypto ETFs Take Center Stage: Nearly Half of Charles Schwab Investors Eye Digital Assets

    0 shares
    Share 0 Tweet 0
  • Bitcoin Miner Cleanspark Acquires 3,853 Bitmain-Made BTC Mining Rigs for $5.9 Million – Mining Bitcoin News

    0 shares
    Share 0 Tweet 0
  • SSV Network brings us Ethereum Staking with its New Permisionless Mainnet

    0 shares
    Share 0 Tweet 0
CryptoSpiel.com

This is an online news portal that aims to provide the latest crypto news, blockchain, regulations and much more stuff like that around the world. Feel free to get in touch with us!

What’s New Here!

  • Ripple CEO Says CLARITY Act Talks Near Breakthrough as Senate Standoff Eases
  • SEC Opens Proceedings on NYSE Proposal to List Grayscale Crypto ETF Options – Regulation Bitcoin News
  • Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

Subscribe Now

Loading
  • Live Crypto Prices
  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

© 2021 - cryptospiel.com - All rights reserved!

No Result
View All Result
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams

© 2021 - cryptospiel.com - All rights reserved!

Please enter CoinGecko Free Api Key to get this plugin works.