CryptoSpiel.com
No Result
View All Result
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams
No Result
View All Result
CryptoSpiel.com
No Result
View All Result

Mistral AI Unveils Pixtral 12B: A Groundbreaking Multimodal Model

September 18, 2024
in Blockchain
Reading Time: 3 mins read
A A
0
Brazilian fintech giant XP Inc Launches Crypto Trading Platform XTAGE
0
SHARES
7
VIEWS
ShareShareShareShareShare


Iris Coleman
Sep 18, 2024 03:29

Mistral AI introduces Pixtral 12B, a state-of-the-art multimodal model excelling in text and image tasks, with notable performance in instruction following and reasoning.





Mistral AI has officially launched Pixtral 12B, the first-ever multimodal model from the company, designed to handle both text and image data seamlessly. The model is licensed under Apache 2.0, according to Mistral AI.

Key Features of Pixtral 12B

Pixtral 12B stands out due to its natively multimodal capabilities, trained with interleaved image and text data. The model incorporates a new 400M parameter vision encoder and a 12B parameter multimodal decoder based on Mistral Nemo. This architecture allows it to support variable image sizes and aspect ratios, and process multiple images within its long context window of 128K tokens.

Performance-wise, Pixtral 12B excels in multimodal tasks and maintains state-of-the-art performance on text-only benchmarks. It has achieved a 52.5% score on the MMMU reasoning benchmark, surpassing several larger models.

Performance and Evaluation

Pixtral 12B was designed as a drop-in replacement for Mistral Nemo 12B, delivering best-in-class multimodal reasoning without compromising on text capabilities like instruction following, coding, and math. The model was evaluated using a consistent evaluation harness across various datasets, and it outperforms both open and closed models such as Claude 3 Haiku. Notably, Pixtral even matches or exceeds the performance of larger models like LLaVa OneVision 72B on multimodal benchmarks.

In instruction following, Pixtral particularly excels, showing a 20% relative improvement in text IF-Eval and MT-Bench over the nearest open-source model. It also performs strongly on multimodal instruction following benchmarks, outperforming models like Qwen2-VL 7B and Phi-3.5 Vision.

Architecture and Capabilities

The architecture of Pixtral 12B is designed to optimize for both speed and performance. The vision encoder tokenizes images at their native resolution and aspect ratio, converting them into image tokens for each 16×16 patch in the image. These tokens are then flattened to create a sequence, with [IMG BREAK] and [IMG END] tokens added between rows and at the end of the image. This allows the model to accurately understand complex diagrams and documents while providing fast inference speeds for smaller images.

Pixtral’s final architecture comprises two components: the Vision Encoder and the Multimodal Transformer Decoder. The model is trained to predict the next text token on interleaved image and text data, allowing it to process any number of images with arbitrary sizes in its large context window of 128K tokens.

Practical Applications

Pixtral 12B has shown exceptional performance in various practical applications, including reasoning over complex figures, chart understanding, and multi-image instruction following. For example, it can combine information from multiple tables into a single markdown table or generate HTML code to create a website based on an image prompt.

How to Access Pixtral

Users can easily try Pixtral via Le Chat, Mistral AI’s conversational chat interface, or through La Plateforme, which allows integration via API calls. Detailed documentation is available for those interested in leveraging Pixtral’s capabilities in their applications.

For those who prefer running Pixtral locally, the model can be accessed through the mistral-inference library or the vLLM library, which offers higher serving throughput. Detailed instructions for setup and usage are provided in the documentation.

Image source: Shutterstock


Credit: Source link

RELATED POSTS

Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

Riot Platforms Sells $289M in Bitcoin as Mining Output Drops 4% in Q1

Exploring Chainlink’s Role Beyond Price Feeds in the Blockchain Ecosystem

Buy JNews
ADVERTISEMENT
ShareTweetSendPinShare
Previous Post

Figure Markets Announces Global Launch, 8% Yield Opportunity

Next Post

Experts: Defi Thrives Where Banks Falter, Fragmentation a Hurdle

Related Posts

Bitcoin Addresses Holding Between 100 and 10,000 BTC Hit a 7-Week High
Blockchain

Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

April 10, 2026
Riot Blockchain Yearly Bitcoin Production Increases by 236%, Accumulates $194M in BTC
Blockchain

Riot Platforms Sells $289M in Bitcoin as Mining Output Drops 4% in Q1

April 2, 2026
Galaxy Digital: Ethereum Developers Discuss Key Upgrades During Latest Consensus Call
Blockchain

Exploring Chainlink’s Role Beyond Price Feeds in the Blockchain Ecosystem

December 9, 2025
Next Post
Experts: Defi Thrives Where Banks Falter, Fragmentation a Hurdle

Experts: Defi Thrives Where Banks Falter, Fragmentation a Hurdle

Brazilian fintech giant XP Inc Launches Crypto Trading Platform XTAGE

Filament Raises $1.1 Million in Seed Funding to Enhance Derivatives DEX on Sei Network

Recommended Stories

No Content Available

Popular Stories

  • Hong Kong’s LEAP toward digital asset dominance

    Hong Kong’s LEAP toward digital asset dominance

    0 shares
    Share 0 Tweet 0
  • Trader Says DeFi Altcoin Aave Witnessing Clear Trend Switch, Updates Forecast on Two Low-Cap Coins

    0 shares
    Share 0 Tweet 0
  • Worldcoin faces regulatory setback in Indonesia over compliance issues

    0 shares
    Share 0 Tweet 0
  • NVIDIA’s AI Platform Enhances ASL Learning Experience

    0 shares
    Share 0 Tweet 0
  • Terra Virtua Joins Williams Racing as Official Metaverse Partner

    0 shares
    Share 0 Tweet 0
CryptoSpiel.com

This is an online news portal that aims to provide the latest crypto news, blockchain, regulations and much more stuff like that around the world. Feel free to get in touch with us!

What’s New Here!

  • Ripple CEO Says CLARITY Act Talks Near Breakthrough as Senate Standoff Eases
  • SEC Opens Proceedings on NYSE Proposal to List Grayscale Crypto ETF Options – Regulation Bitcoin News
  • Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

Subscribe Now

Loading
  • Live Crypto Prices
  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

© 2021 - cryptospiel.com - All rights reserved!

No Result
View All Result
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams

© 2021 - cryptospiel.com - All rights reserved!

Please enter CoinGecko Free Api Key to get this plugin works.