Startup Gimlet Labs is solving the AI inference bottleneck in a surprisingly elegant way

Foto: TechCrunch Startups
$80 million in a Series A funding round – this is the amount raised by Gimlet Labs, a startup founded by Stanford professor Zain Asgar, to permanently eliminate the bottleneck in the AI inference process. The solution to the computing power shortage is to be the world's first multi-silicon inference cloud. This innovative software allows AI workloads to run simultaneously on diverse hardware, breaking down existing technological barriers. The Gimlet Labs system can intelligently distribute AI application tasks between traditional CPUs, specialized GPUs, and high-memory systems. For the global market, this means a drastic reduction in operating costs and independence from the scarcest, most expensive chips on the market. Instead of waiting in queues for access to the latest infrastructure, model creators and developers will be able to efficiently utilize existing, distributed hardware resources. The investment, led by Menlo Ventures, confirms that the key to scaling AI is not solely the production of new electronics, but primarily the intelligent orchestration of available computing power. Such democratization of access to infrastructure could significantly accelerate the implementation of advanced language models in everyday digital services.
In a world dominated by the arms race in the silicon space, where the availability of NVIDIA H100 chips has become the new currency of the global economy, the startup Gimlet Labs proposes a solution that could turn the tables. Instead of waiting in line for scarce GPU units, the company led by Zain Asgar — a professor from Stanford and a serial entrepreneur — has raised 80 million dollars in a Series A funding round to develop technology that makes the type of processor secondary. This is not another attempt to build a "better chip," but a radical change in the way software communicates with infrastructure.
The funding round, led by the Menlo Ventures fund, confirms that the industry is desperately seeking a way out of the impasse related to the AI inference bottleneck. Currently, tech giants and smaller companies alike struggle with enormous costs and logistical constraints while trying to scale their models. Gimlet Labs enters the market with the promise of the "first and only multi-silicon inference cloud," which allows for the simultaneous running of artificial intelligence workloads on vastly different hardware architectures.
Architecture without borders, or how to reconcile fire and water
The key to Gimlet Labs' innovation is a software layer that can intelligently divide the computational tasks of an AI model between units that previously did not work together in real-time. The system allows for the distribution of AI application workloads not only across traditional CPUs and specialized GPUs, but also onto high-memory systems and niche accelerators. Most impressively, this technology enables seamless operation on chips from NVIDIA, AMD, Intel, ARM, and even exotic solutions from Cerebras or d-Matrix.
Read also
In practice, this means that developers no longer have to optimize their models for a specific architecture (e.g., CUDA for Nvidia). Gimlet Labs takes this burden off engineers' shoulders by offering hardware layer abstraction. This approach solves the problem of market fragmentation — instead of relying on a single provider, companies can build computing clusters from what is currently available on the market or what is sitting in their data centers. The ability to utilize ARM power alongside powerful Cerebras units within a single task is a technological masterpiece that significantly lowers the barrier to entry for advanced AI deployments.
- Full interoperability: Support for giants (NVIDIA, AMD) and innovators (Cerebras, d-Matrix).
- Cost efficiency: Utilizing cheaper CPU units for less demanding parts of the inference process.
- Scalability: The ability to build hybrid computing clouds without the risk of vendor lock-in.
- Memory optimization: Intelligent resource management in high-bandwidth systems.
The end of the era of single-vendor dictatorship
The investment by Menlo Ventures in Gimlet Labs is a clear signal to the market: the era of hardware monoculture in AI is coming to an end. Although NVIDIA still possesses the best software ecosystem, Zain Asgar's proposal hits the leader's weakest point — limited supply and high price. If Gimlet Labs' software indeed allows for high inference performance using a mix of Intel and AMD chips, the power dynamics in the data center sector will change rapidly. This is an opportunity for smaller players, like d-Matrix, to have their specialized chips enter the mainstream without having to fight for every developer individually.
It is worth noting the figure of the founder himself. Zain Asgar, as a "successfully exited founder," possesses a rare combination of deep academic knowledge from Stanford and business instinct. His vision of a "multi-silicon inference cloud" is not a theoretical concept, but a ready tool for solving a real business problem: how to serve responses from Large Language Models (LLM) cheaper and faster. In a world where the cost of a single query to a model determines the profitability of entire products, Gimlet Labs' technology becomes a key element of the tech stack of a modern AI company.
This is not just a matter of convenience. It is a matter of survival in a world where access to computing power determines the success or failure of a startup. The ability to run a model on any available silicon is the ultimate liberation from the supply chain.
Democratization of inference through intelligent software
The biggest challenge for Gimlet Labs will be maintaining low latency with such diverse infrastructure. Transferring data between units from different manufacturers usually involves enormous time overheads. However, if the startup has managed to minimize these losses, we are dealing with a breakthrough on the scale of server virtualization from the early 2000s. Just as VMware allowed us to forget about the physical limitations of servers, Gimlet Labs may allow us to forget about which chip is currently crunching our data.
The prospect of using high-memory systems in combination with traditional processors to handle massive contexts in AI models opens up entirely new possibilities for the medical, legal, and scientific industries. Where it is not only speed that counts, but also the ability to process giant datasets "on the fly," the flexibility offered by Gimlet Labs will prove invaluable. The amount of 80 million dollars will allow the company to aggressively scale its engineering team and likely move toward rapid pilot deployments with major cloud service providers looking to diversify their hardware offerings.
It can be assumed with a high degree of certainty that multi-silicon technology will become the standard within the next few years. The market will not accept long-term dependence on a single manufacturer, and the success of Gimlet Labs will show that the system's intelligence resides in the management software, not just in the silicon itself. If the business model based on a universal inference cloud proves successful, we will witness the birth of a new infrastructure giant that connects the fragmented AI accelerator market into one coherent computing organism.








