CryptoSpiel.com
No Result
View All Result
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams
No Result
View All Result
CryptoSpiel.com
No Result
View All Result

Anyscale Introduces New Replica Compaction to Optimize Resource Usage

July 15, 2024
in Blockchain
Reading Time: 5 mins read
A A
0
Brazilian fintech giant XP Inc Launches Crypto Trading Platform XTAGE
0
SHARES
4
VIEWS
ShareShareShareShareShare


Felix Pinkston
Jul 15, 2024 18:56

Anyscale launches Replica Compaction to address resource fragmentation, enhancing resource utilization and reducing costs for Ray Serve deployments.





Companies embracing AI are increasingly facing the issue of resource utilization and cost management. Model serving and inference in particular need to be able to scale up and down over time in response to traffic. Ray Serve is a scalable model serving library built on Ray to help handle these dynamics. And while open source systems like Ray Serve help manage increased traffic, even sophisticated systems struggle to scale down once traffic abates. This type of resource fragmentation inevitably leads to underutilized resources and higher costs.

Anyscale’s new Replica Compaction feature helps to solve resource fragmentation by optimizing resource usage for online inference and model serving. Take a look at how this feature works, as well as how you can use it in practice.

Background: What is Ray Serve?

Ray Serve has several key concepts:

  • Deployment: A deployment contains business logic or an ML model to handle incoming requests.

  • Replica: A replica is an instance of a deployment that can handle requests. These are implemented with Ray Actors. The number of replicas can be scaled up or down (or even autoscaled) to match the incoming request load.

  • Application: An application is the unit of upgrade in a Ray Serve cluster. An application consists of one or more deployments.

  • Service: A Service is a Ray Serve cluster that can consist of one or more applications.

Deployments handle incoming requests independently which allows for parallel processing and efficient resource utilization in most cases. For example, Ray Serve makes it possible to create deployments for Llama-3-8B and Llama-3-70B on the same Service with different resource requirements (1 GPU and 4 GPU per replica respectively). Both of these deployments would scale independently in response to their respective traffic.

The Problem of Resource Fragmentation

Resource fragmentation occurs when scaling activities lead to uneven resource utilization across nodes. As replicas increase, the autoscaler will start new nodes to handle the increased deployment load. But then, when traffic decreases and models scale down, the same nodes that were needed to handle the increased load become underutilized. This is one of the most common reasons for increased costs and reduced cluster performance.

Essentially, when scaling a specific deployment or model (e.g. Model A), Ray Serve takes into account the traffic and resource requirements for that particular deployment alone. The state, replicas, and traffic of any other deployments (e.g. Models B and C) are not taken into account during the scaling process. Because scaling only considers a single deployment at a time, resource fragmentation is inevitable as traffic changes and the cluster scales up and down.

image1.png

Solving the Resource Fragmentation Issue with Anyscale’s Replica Compaction

Anyscale introduces Replica Compaction to address resource fragmentation. With Replica Compaction, Anyscale will automatically migrate replicas into fewer nodes in order to optimize resource use and reduce costs. There are three main components to the Replica Compaction feature:

  • Replica Migration: Compaction monitors the cluster for opportunities to migrate replicas. If a node is minimally used, Anyscale’s Replica Compaction will automatically move replicas to other nodes with sufficient capacity. Every node in the cluster is checked and nodes with fewer replicas that can be released are prioritized.

  • Zero Downtime: Migration is effortless. Anyscale Services seamlessly spins up a new replica, monitors its health, reroutes traffic, and removes the old replica.

  • Autoscaler Integration: The Anyscale Autoscaler continuously searches for idle nodes post-migration and spins them down as needed, reducing node count—and costs.

Let’s take a look at our same example from above, now with Anyscale’s Replica Compaction. With Replica Compaction, Anyscale is able to detect when Model A is downscaled, and it automatically migrates the excess Model C replicas into a single node.

image2.png

Example of Anyscale Replica Compaction. Anyscale Replica Compaction detects resource fragmentation is causing unnecessary resource usage. The replicas are automagically shifted (without interrupting production traffic) to a single node, thereby reducing costs and boosting utilization.

Replica Compaction in Action: Practical Results

To test the new Replica Compaction feature, Anyscale ran a live production workload for several months. Take a look at what was run—and how Replica Compaction decreased cost and increased efficiency.

Case Study:

Anyscale offers a serverless API to prompt LLMs including Mistral, Mixtral, Llama3, and more. These models are deployed as replicas in an Anyscale Service. This service has been running for several months, serving 10+ models to users at scale with widely varying traffic patterns.

After releasing Anyscale Replica Compaction, significant savings and efficiency improvements were found looking at tokens per GPU second. With no other changes (i.e. changing the tensor parallelism or models being served and hardware used), the overall efficiency improvement post Replica Compaction was ~10% on average. Overall, in the immediate day after enabling, instance seconds declined 3.7%, despite traffic, measured by # tokens, increasing by 11.2% in the same period. Since high-end GPUs like A100s and H100s are used for serving models, this translates to substantial cost savings.

The impact and savings from Replica Compaction vary widely depending on the distribution of traffic, number of deployments, and underlying instances. In less scaled scenarios, costs can be reduced by 50% (or more!).

What’s Next for Replica Compaction

The team is continuing to improve the Replica Compaction algorithm including work to factor in node costs and resource types to better optimize usage and overall costs. Stay tuned for more exciting updates in the coming months.

Get Started with Anyscale

Anyscale’s new Replica Compaction feature significantly improves resource management in distributed clusters by addressing resource fragmentation. This ensures an efficient, cost-effective infrastructure for Ray Serve deployments, with ongoing enhancements promising even smarter resource management. Anyscale Replica Compaction is configured by default for Ray Serve applications deployed on the Anyscale Platform.

Get started today!

Image source: Shutterstock


Credit: Source link

RELATED POSTS

Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

Riot Platforms Sells $289M in Bitcoin as Mining Output Drops 4% in Q1

Exploring Chainlink’s Role Beyond Price Feeds in the Blockchain Ecosystem

Buy JNews
ADVERTISEMENT
ShareTweetSendPinShare
Previous Post

6 Hidden Gems: Invest in These Cryptos for 10X Gains in 2024

Next Post

Price Increasing 40%: BlockDAG to Unmask its CEO On July 29th; BNB Surges While Cosmos Transactions Suspended

Related Posts

Bitcoin Addresses Holding Between 100 and 10,000 BTC Hit a 7-Week High
Blockchain

Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

April 10, 2026
Riot Blockchain Yearly Bitcoin Production Increases by 236%, Accumulates $194M in BTC
Blockchain

Riot Platforms Sells $289M in Bitcoin as Mining Output Drops 4% in Q1

April 2, 2026
Galaxy Digital: Ethereum Developers Discuss Key Upgrades During Latest Consensus Call
Blockchain

Exploring Chainlink’s Role Beyond Price Feeds in the Blockchain Ecosystem

December 9, 2025
Next Post
Price Increasing 40%: BlockDAG to Unmask its CEO On July 29th; BNB Surges  While Cosmos Transactions Suspended

Price Increasing 40%: BlockDAG to Unmask its CEO On July 29th; BNB Surges While Cosmos Transactions Suspended

DOGE Hits 90 Million Address Milestone: Could These Altcoin Rivals Surpass This?

DOGE Hits 90 Million Address Milestone: Could These Altcoin Rivals Surpass This?

Recommended Stories

No Content Available

Popular Stories

  • Winklevoss Twins Continue Crypto Donation Spree With Another $1,000,000 in Bitcoin (BTC)

    Trader Says DeFi Altcoin Aave Witnessing Clear Trend Switch, Updates Forecast on Two Low-Cap Coins

    0 shares
    Share 0 Tweet 0
  • A Comprehensive Guide on How to Buy PENDLE

    0 shares
    Share 0 Tweet 0
  • Japan’s 20% crypto tax sets a new bar in Asia, pressuring Singapore and Hong Kong as retail costs fall

    0 shares
    Share 0 Tweet 0
  • Australia’s ASIC fines Kraken operator Bit Trade $5M for regulatory breaches

    0 shares
    Share 0 Tweet 0
  • Solana Foundation Deletes Controversial Ad After Crypto Community Backlash

    0 shares
    Share 0 Tweet 0
CryptoSpiel.com

This is an online news portal that aims to provide the latest crypto news, blockchain, regulations and much more stuff like that around the world. Feel free to get in touch with us!

What’s New Here!

  • Ripple CEO Says CLARITY Act Talks Near Breakthrough as Senate Standoff Eases
  • SEC Opens Proceedings on NYSE Proposal to List Grayscale Crypto ETF Options – Regulation Bitcoin News
  • Anthropic Reveals Claude Code Tool Design Philosophy Behind AI Agent Development

Subscribe Now

Loading
  • Live Crypto Prices
  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

© 2021 - cryptospiel.com - All rights reserved!

No Result
View All Result
  • Home
  • Live Crypto Prices
  • Live ICO
  • Exchange
  • Crypto News
  • Bitcoin
  • Altcoins
  • Blockchain
  • Regulations
  • Trading
  • Scams

© 2021 - cryptospiel.com - All rights reserved!

Please enter CoinGecko Free Api Key to get this plugin works.