Microsoft Research’s Post

Microsoft Research is excited to introduce Q-Sparse: a breakthrough in training fully sparsely-activated LLMs. Q-Sparse supports both full-precision and 1-bit LLMs. Its synergy with BitNet b1.58 advances LLM efficiency, including cost and energy use. https://msft.it/6040lumcK

  • Figure showing how Q-Sparse achieves a superior inference-optimal scaling law than the dense models. It saves significant compute of matrix multiplication by top-K sparsification of the activations.
Allan M.

Javascript Developer, DeepRL, Prompt Engineering, Model Coercion

5mo

Microsoft Research You can sparsify by using Q(Q in inference. Apply recompiling for hidden weights with << and cluster again. There is a post doing that with deflect(<< in my timeline. Just flush the context all as feed forward, forever doing the same. Just hidden weights cluster in inference a single time. Focus on your flushback, fix it and will skyrocket your results. Hire me. I do know how to take advantage of the way transformers interpret stuff. On real-time inference. on train inference. on eval inference. The result of your study in this paper I read as non-conclusive with still huge activation even with sparsity. Moe will help with your side-effects on FeedForward. But you are being bombarded at FlushBack. The encoder window is flushing half of your traversing cause you are still using a long range on Encoding. Even sparsing, you are still far from recompiling in inference. YOCO is huge for caching KV I would defo go with that.

Like
Reply
Aditya Jaiswal

Aspiring AI Developer | SIH'23 Finalist | Former DeepSoft Intern | Specializing in LLMs, Deep Learning, AI & ML | Developing Innovative AI Solutions Across Various Domains

5mo

Q-Sparse is truly a game-changer in the realm of LLM efficiency! The combination of full-precision and 1-bit LLMs, alongside BitNet b1.58, paves the way for significant advancements in both cost and energy efficiency. This breakthrough has the potential to revolutionize how we approach large-scale language models, making high-performance AI more accessible and sustainable. Kudos to Microsoft Research for pushing the boundaries of AI innovation!

Like
Reply

1-bit LLMs are a big thing. When both training and inference are built natively to run on addition instead of multiplication cost for compute and energy drop dramatically, without a meaningful sacrifice to perplexity scores.

Awais Mukhtar

Development Team Lead at Future Connect Training & Recruitment Ltd.

5mo

Impressive Innovation,Microsoft Research! Excited to see the advancements Q-Sparse brings.

Like
Reply
James Pustorino

Manager - AI and Technology at PwC | Leading AI integration in Deals and Tax practices | CPA

5mo

Exciting stuff

Like
Reply
Mohamed Amine Ferrag, PhD

Associate Professor of AI & Cybersecurity I BSc, MSc, PhD and HDR degrees

5mo

Inspiring!

Like
Reply
See more comments

To view or add a comment, sign in

Explore topics