Study shows the strongest results whenconsumer Consumer GPUs AI clusters complement enterprise-grade GPUs - delivering comparablecompute power at a fraction of the cost.
Delaware, US | 24th November 2025 | A peer-reviewed study by io.net hasconfirmed that consumer GPUs, such as Nvidia’s RTX 4090, can play a pivotalrole in scaling large language model (LLM) inference. The paper, IdleConsumer GPUs as a Complement to Enterprise Hardware for LLM Inference, wasaccepted by the 6th International Artificial Intelligence and BlockchainConference (AIBC 2025) and provides the first open benchmarks of heterogeneousGPU clusters deployed on io.net’s decentralized cloud.
The analysis finds that RTX 4090clusters can deliver 62–78% of enterprise H100 throughput at approximately halfthe cost, with token costs up to 75% lower for batch or latency-tolerantworkloads. The study also shows that while H100s remain 3.1× more energy-efficientper token, leveraging idle consumer GPUs can reduce embodied carbon emissionsby extending hardware lifetimes and tapping renewable-rich grids.
Aline Almeida, Head of Research at IOG Foundation and Lead Author of the study,said “Our findings demonstrate that hybrid routing across enterprise andconsumer GPUs offers a pragmatic balance between performance, cost andsustainability. Rather than a binary choice, heterogeneous infrastructureallows organizations to optimize for their specific latency and budgetrequirements while reducing carbon impact.”
The research highlights practicalpathways for AI developers and MLOps teams to build more economical LLMdeployments. Through improved understanding of latency requirements developersachieve the best results, leveraging enterprise GPUs for real-time applicationswith consumer GPUs for development, batch processing, and overflow capacity,organizations can achieve near-H100 performance at a fraction of the cost.
Gaurav Sharma, CEO of io.net, said“This peer-reviewed analysis validates the core thesis behind io.net: that thefuture of compute will be distributed, heterogeneous, and accessible. Byharnessing both datacenter-grade and consumer hardware, we can democratizeaccess to advanced AI infrastructure while making it more sustainable.”
The paper reinforces io.net’s missionto expand global compute capacity through decentralized networks, offeringdevelopers programmable access to the world’s largest pool of distributed GPUs.
● Cost-performance sweet spots:
4× RTX 4090 configurations achieve62-78% of H100 throughput at approximately half the operational cost, with thelowest cost per million tokens ($0.111-0.149).
● Latency boundaries:
H100 maintains sub-55ms P99time-to-first-token even at high loads, while consumer GPU clusters can handletraffic that can tolerate 200–500 ms tail latencies (example: research,dev/test environments, stream chat with respective latency allowed, batch jobs,embedding and eval sweeps).
For more details on the research,please visit the GitHub repository.

END
Press Contact:
Ed Doljanin
press@ecologymedia.co.uk
+44 7591 559007
With the world’s largest network of distributedGPUs and high-performance, on-demand compute, io.net is the only platformdevelopers and organizations need to train models, run agents and scale LLMinfrastructure. Combining the cost-effective and builder-friendly programmableinfrastructure of io.cloud with the unified and API-accessible toolkit ofio.intelligence, io.net is the full stack for large-scale AI startups. To learnmore, visit: https://io.net/about-us
Mona Porwal is an experienced crypto writer with two years in blockchain and digital currencies. She simplifies complex topics, making crypto easy for everyone to understand. Whether it’s Bitcoin, altcoins, NFTs, or DeFi, Mona explains the latest trends in a clear and concise way. She stays updated on market news, price movements, and emerging developments to provide valuable insights. Her articles help both beginners and experienced investors navigate the ever-evolving crypto space. Mona strongly believes in blockchain’s future and its impact on global finance.