Artificial intelligence computing startup D-Matrix Corp. said today it has developed a new implementation of 3D dynamic random-access memory technology that promises to accelerate inference workloads by “orders of magnitude.”
The new technology was announced at the Hot Chips 2025 conference, where the company is showcasing how it will enable new frontier models to scale with substantial gains in efficiency and affordability.
D-Matrix designs specialized processors and compute platforms that target AI inference workloads. Inference refers to when AI models are used to make predictions or generate images and text from data they’ve never seen before, so they can provide assistance and insights to end users.
The company’s main products include its memory-efficient chiplet-based D-Matrix Corsair platform, which is the world’s first digital in-memory compute inference accelerator, and a high-bandwidth peripheral interconnect card that links clusters of high-powered graphics processing units.
D-Matrix says its compute infrastructure is designed to solve some of the challenges in the economics of running AI at large scale. It says existing data center infrastructure is unsustainable. As more companies race to adopt increasingly powerful AI models and build services atop of them, they’re putting an enormous strain on the available data center infrastructure.
Some of the biggest cloud infrastructure providers, such as Microsoft Corp. and Google Cloud, have admitted that they’re struggling with capacity constraints from this demand. But as they rush to build out new data centers, they’re raising prices to help pay for it, while also throttling usage to try and cater to more customers.
The chip memory bottleneck
D-Matrix says that memory has emerged as the biggest bottleneck in scaling AI, and argues that simply throwing more GPUs at data centers won’t fix the problem. In a blog post, D-Matrix co-founder and Chief Technology Officer Sudeep Bhoja refers to this issue as the “memory wall” and points out that while compute performance has increased by roughly three times every two years, memory bandwidth lags behind at just 1.6 times.
“The result is a widening gap where pricey processors sit idle, waiting for data to arrive,” Bhoja says. “This matters because inference, not training, is quickly becoming the dominant AI workload.”
Bhoja gives the example of the GPU cloud infrastructure provider CoreWeave Inc., which recently revealed that more than half of its workloads are inference. He said most analysts predict that inference demands will continue to grow, and make up more than 85% of all AI workloads in the next two to three years.
“Every query, chatbot response and recommendation is an inference task repeated at massive scale, and each one is constrained by memory throughput,” Bhoja said. “Today’s AI applications demand better memory.”
Smashing through the wall
D-Matrix wants to help the industry overcome this memory wall and to do so it has decided to integrate higher-throughput 3D DRAM into its next-generation chip architecture Raptor. 3D DRAM stacks multiple layers of memory cells vertically, allowing for higher storage density and improved performance compared to traditional 2D-based DRAM. It reduces space and power consumption while increasing data access speeds, enabling it to scale high-performance applications.
According to Bhoja, by combining 3D DRAM with its specialized interconnects, Raptor will be able to smash through the memory wall and unlock significant gains in terms of AI performance and cost-efficiency. He said the company is targeting an ambitious 10-times improvement in memory bandwidth and 10-times better energy efficiency with Raptor when running inference workloads, compared with existing HBM4 memory technology.
“These are not incremental gains — they are step-function improvements that redefine what’s possible for inference at scale,” Bhoja said.
Bhoja concedes that the memory wall is not easily overcome as it has been decades in the making, but he pointed out that his company has been working on this very challenge since its inception, and believes it’s on the verge of cracking it at last.
“With our commitment to memory-centric technology, D-Matrix is blazing the trail beyond it and building a sustainable path for the future of AI,” he said.
Image: SiliconANGLE/Dreamina AI
Support our mission to keep content open and free by engaging with theCUBE community. Join theCUBE’s Alumni Trust Network, where technology leaders connect, share intelligence and create opportunities.
- 15M+ viewers of theCUBE videos, powering conversations across AI, cloud, cybersecurity and more
- 11.4k+ theCUBE alumni — Connect with more than 11,400 tech and business leaders shaping the future through a unique trusted-based network.
About SiliconANGLE Media
Founded by tech visionaries John Furrier and Dave Vellante, SiliconANGLE Media has built a dynamic ecosystem of industry-leading digital media brands that reach 15+ million elite tech professionals. Our new proprietary theCUBE AI Video Cloud is breaking ground in audience interaction, leveraging theCUBEai.com neural network to help technology companies make data-driven decisions and stay at the forefront of industry conversations.