LTM-2-Mini is our first model with a 100M token context window. That’s 10 million lines of code, or 750 novels. Full blog: https://lnkd.in/g5-pKWvi Our LTM (Long Term Memory) mechanism needs >1,000x less compute and memory than Llama 3.1 405B’s attention. Llama 3.1 would need 638 H100s *per user* to store a 100M token KV cache. LTM needs a small fraction of one. SSMs, RNNs, and RAG all exploit weaknesses in evals like Needle In a Haystack, so we made a new eval, HashHop: 1) Incompressible 2) Multi-hop 3) No semantic hints 4) No recency bias With context solved, we now focus on unbounded inference-time compute as the next (and potentially last) breakthrough we believe is needed to build reliable AGI. Imagine if you could spend $100 and 10 minutes on one task and reliably get a great pull request for an entire feature. That’s our goal. We are 23 people (+ 8000 H100s) working on a single project: co-designing for long context, inference-time compute, and end-to-end RL to automate coding and research. Ben Chess (fmr. OpenAI supercomputing lead) just joined to help us scale and we’re hiring more engineers and researchers across ML, CUDA, infra, security, and more: https://magic.dev/careers
Wow - this is category defining in both context and scalability!
Congrats Magic ! Do you already have some insights on quality decay with increasing context length?
Awesome news! Congrats and way to go!!
Nice:)
Amazing! Congratulations team!
CTO & Angel investor 😇
4moLooks cool, any way to test this?