🔮 The complicated costs of generative AI
via DALL-E

🔮 The complicated costs of generative AI

AI chip startup Groq (not to be confused with Musk’s Grok!) has put all its chips on the table. The team released a demo demonstrating Mixtral, an open-source LLM, running through their API, generating responses four times faster than other services at highly competitive rates.

Unlike NVIDIA, which produces GPUs with versatile capabilities across various tasks, Groq’s processors are tailored for specific, high-performance AI computations, potentially offering more specialised efficiency in these areas.

Groq addresses one problem that both startups and large enterprises looking to scale their AI products: the cost of running models. As one anonymous founder said, they only make money “if people don’t use the product.” I’m hearing that enterprises are facing sticker shock when they move from a proof-of-concept, for a few users, to widespread deployment across their orgs.

To add to the complications, we’re now in a phase where LLMs are one part of increasingly complex systems. Building valuable applications involves linking AI models with databases and other applications and even sampling multiple LLMs. These systems yield more robust results, but they also multiply costs and create a challenging headache due to the need to constantly update components within a rapidly evolving ecosystem.

For any firm, this volatility in innovation makes planning hard. Should you buy now? Or wait a bit longer, when prices come down? Should you build for scale straight out of the gate or risk unsustainable economics later down the line?

AI software startups need to figure out when and whether their unit economics can work, especially in an environment where capital is scarce. The FTC is watching Big Tech acquisitions like a hawk, stemming from concerns about their computing dominance. Consequently, many startups may find themselves without a viable exit strategy.

And that brings us back to Groq. They aren’t an AI software company, yet they face similar questions about their unit economics. Their eye-catching release is likely a fundraising tactic, and the broader economics of their hardware costs raise questions about long-term viability. SemiAnalysis breaks down their economics and questions the hardware’s ability to handle larger models and context windows:

The question that really matters though, is if low latency small model inference is a large enough market on its own, and if it is, is it worth having specialised infrastructure when flexible GPU infrastructure can get close to the same cost and be redeployed for throughput or large model applications fairly easily.

Groq’s gone all in, but unfortunately, their niche combined with high costs may prevent them from scaling effectively – a complicated reality for many AI startups.

CHESTER SWANSON SR.

Realtor Associate @ Next Trend Realty LLC | HAR REALTOR, IRS Tax Preparer

10mo

Thanks for Sharing.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics