Graphcore’s Post

View organization page for Graphcore, graphic

31,693 followers

There are many AI news digests around, but Papers of the Month from Graphcore Research gives the experts’ view on the most important advances in Artificial Intelligence. The latest edition goes big on LLMs including the existence of ‘super weights’ that prove crucial to language model performance. https://lnkd.in/gPjuEsam

An LLM feast: November papers

graphcore.ai

To view or add a comment, sign in

More Relevant Posts

Hossein Z.

Senior Cyber Security Specialist @ Nestlé | Artificial Intelligence | AI Security | Deep Learning | Machine Learning | Network Security | Cyber Security | Linux | Python
1mo
Report this post
𝗬𝗼𝘂 𝗖𝗮𝗻’𝘁 𝗤𝘂𝗮𝗻𝘁𝗶𝘇𝗲 𝗡𝗲𝘂𝗿𝗮𝗹 𝗡𝗲𝘁𝘄𝗼𝗿𝗸𝘀 𝗙𝗼𝗿𝗲𝘃𝗲𝗿! 𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 𝗟𝗮𝘄𝘀 𝗶𝗻 𝗔𝗜 🤔 These days its a hot topic in discussions on Scaling Laws, and now a paper has come out attempting to present Scaling Laws for different levels of arithmetic precision. We can manipulate the precision of neural networks in two ways: either by post-training quantization or by training the network from the beginning with low precision, like 8-bit or 16-bit. We do this to create a smaller network for inference. Now, there is a practical research and papers that makes two significant claims. First, simply increasing the training dataset size doesn’t necessarily mean you’ll end up with a better model after quantization. That’s why Llama 3, with its over 400 billion parameters, is hard to quantize effectively. And second finding is that even training models with lower precision isn’t cost-free. For example, a model with 1 billion parameters trained in FP4 has comparable performance to a model with 250 million parameters in BF16. https://lnkd.in/dv7s3ieu

Scaling Laws for Precision

arxiv.org
Like Comment
To view or add a comment, sign in
Kalyan KS
3mo
Report this post
𝐋𝐚𝐫𝐠𝐞 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬: 𝐀𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐭𝐡𝐫𝐨𝐮𝐠𝐡 𝐏𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧 This course is aimed at developers, data scientists, and engineers looking to build LLM-centric applications with the latest and most popular frameworks. By the end of this course, you will have built an end-to-end LLM workflow that is ready for production! 𝐓𝐡𝐢𝐬 𝐜𝐨𝐮𝐫𝐬𝐞 𝐜𝐨𝐯𝐞𝐫𝐬 LLM 00 - Introduction LLM 01 - Applications with LLMs LLM 02 - Embeddings, Vector Databases, and Search LLM 03 - Multi-stage Reasoning LLM 04 - Fine-tuning and Evaluating LLMs LLM 05 - Society and LLMs LLM 06 - LLMOps Playlist details in the comments. #llms #generativeai #ai #nlproc #deeplearning #onlinecourse #aibeginners
8 Comments
Like Comment
To view or add a comment, sign in
Jesus Rodriguez

CEO of IntoTheBlock, Co-Founder, Co-Founder of LayerLens, Faktory,and NeuralFabric, Founder of The Sequence AI Newsletter, Guest Lecturer at Columbia, Guest Lecturer at Wharton Business School, Investor, Author.
9mo
Report this post
#TheSequence published A Summary Of Our Series About LLM Reasoning https://lnkd.in/eHbErwRU #artificialintelligence #generativeai #LLMs

Edge 379: A Summary Of Our Series About LLM Reasoning

thesequence.substack.com
Like Comment
To view or add a comment, sign in
Solomon Walker

Founder/CEO at MUSEUM of DIGITAL FINE ARTS
3w
Report this post
The recent release of OpenAI o1 has brought great attention to large reasoning models (LRMs), and is inspiring new models aimed at solving complex problems classic language models often struggle with. Building on the success of o1 and the concept of LRMs, researchers at Alibaba have introduced Marco-o1, which enhances reasoning capabilities and tackles problems with open-ended solutions where clear standards and quantifiable rewards are absent. OpenAI o1 uses “inference-time scaling” to improve the model’s reasoning ability by giving it “time to think.” Basically, the model uses more compute cycles during inference to generate more tokens and review its responses, which improves its performance on tasks that require reasoning. o1 is renowned for its impressive reasoning capabilities, especially in tasks with standard answers such as mathematics, physics and coding.

Alibaba researchers unveil Marco-o1, an LLM with advanced reasoning capabilities

https://venturebeat.com
Like Comment
To view or add a comment, sign in
Abram Van Der Geest, Ph.D.

Staff AI / Machine Learning Software Engineer | machine learning / AI, model pipelines and agents, cross-functional collaboration | I help architect, build, and deploy data and model pipelines.
4mo
Report this post
Great paper discussing Questionable Practices in ML research. However, I think it is important to note that this is true of #ML in industry and products as well. Which is why good experiment tacking and monitoring practices is importan Thanks to @Daniel A. for his post (https://lnkd.in/eQXDsCu8) paper: https://lnkd.in/eNpVEtzk

2407.12220

arxiv.org
Like Comment
To view or add a comment, sign in
Muchiu (Henry) Chang, PhD. Cantab (Cambridge, UK)

Consultant in Patent Intelligence and Engineering Management
9mo
Report this post
Yes, AI can do many things, but NOT everything. Can AI make love? Or, is there any of the artificial Intelligence (AI) products you know can answer the following questions? "Who, in England of UK, has new US patents granted on the nearest Tuesday, when the USPTO releases the newly granted US patents on a weekly basis?" "Who, in the "江蘇‘’ province of China, has new US patents granted on the nearest Tuesday, when the USPTO releases the newly granted US patents on a weekly basis?" With our intellectual property (IP), we can. Do you or any of your contacts need our expertise and our IP? We're selling, NOT just talking. Thanks. Here is our pitch deck: "Experiment results showed that with our intellectual property (IP), a copyrighted multilingual metadata, we are doing what AI, like ChatGPT, can't do in data analytics. Without metadata, NO data can be found/retrieved, even by the most advanced technologies, like AI, supercomputers, etc. https://lnkd.in/g-aJFnXR Our IP can also make your information service UNIQUE in the world." P.S. We wrote this post ourselves, without using any AI tool.

Pamela Fox

I like to learn, teach, and create.
9mo

"RAFT: A new way to teach LLMs to be better at RAG" New research from UC Berkeley researchers: https://lnkd.in/gN2wgKAn A TLDR from Cédric Vidal: - RAFT teaches the model how to distinguish useful from irrelevant documents for a given question - RAFT teaches the model how to properly answer a question, which documents are interesting to answer as well as the chain of thoughts that leads to the answer including snippets from the documents used for reasoning - You don’t have to fine tune again every time you add a document to your knowledge base / vector db, it should generalize as long as the domain is the same (legal, medical or something else) - It doesn’t change anything to the retrieval part, you still use semantic vector search or hybrid search. And it doesn’t change anything to the embedding model

RAFT: A new way to teach LLMs to be better at RAG

techcommunity.microsoft.com
Like Comment
To view or add a comment, sign in
Etienne Bernard

Founder & CEO at NuMind | YC S22
9mo Edited
Report this post
NuMind (YC S22)'s first ML paper is out! 🎉 "NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data" https://lnkd.in/dnbNfEbf, in collaboration with Univ Paris Diderot. Congrats Sergei Bogdanov 💫, Alexandre Constantin, Timothée Bernard, and Benoit Crabbé! This research is a continuation of an R&D project we started last year https://lnkd.in/ePAAiuXY, in which we created a NER-specific foundation model using GPT-3.5 annotated data. Given the novelty of the approach and the impressive performance we were getting, we decided to turn it into actual academic research... which was not the easiest endeavor... 😬 (I had forgotten how draining academic research could be!) but we now have a better understanding of why our model works, which will be key to creating next versions (all available here: https://lnkd.in/drx4UA3n). As an eye-candy, here is an updated "concept map" that was obtained by our concept encoder during the training of NuNER:
16 Comments
Like Comment
To view or add a comment, sign in
Markus Kasanmascheff

Making things happen.. online and offline.
6mo Edited
Report this post
OpenAI´s GPT-4, and supposedly other top-notch LLMs appear to be less powerful as promised. A study published in the Artificial Intelligence and Law journal highlights numerous methodological flaws that could compromise the integrity of the reported scores. #openai #llms #llm #gpt4 #gpt4o #artificialintelligence #ai

Study Finds GPT-4's Bar Exam Scores Overinflated and Methodologically Flawed - WinBuzzer

https://winbuzzer.com
Like Comment
To view or add a comment, sign in
John Thompson

Keynote speaker. Author of 4 books on AI, Data, & Analytic Teams. A serial innovator successful in leading product teams in designing, developing, & delivering solutions that drive competitive advantage at global scale
9mo
Report this post
The Sequence is a reliable source of technology information on innovations in Generative AI. In this post, they provide a summary of the 13 recent issues on LLM reasoning. Really nice reference resource. https://lnkd.in/gRHriPQx

Edge 379: A Summary Of Our Series About LLM Reasoning

thesequence.substack.com
Like Comment
To view or add a comment, sign in
Christopher Foster-McBride

The ‘AI Risk guy’, Co-Founder @Digital Human Assistants | Founder @AI for the Soul | Co-Founder @tokes compare | Founder @Medical Coding and Documentation GPT, also healthcare and public services
2mo
Report this post
📚 "LLMs Will Always Hallucinate, and We Need to Live With This" by Sourav Banerjee and the team. 🧠 This is a foundational paper - if you're an AI / LLM practitioner or champion (or naysayer), this is worth reading - many of you will know this but the evidence is vital. 🔍 Summary: As Large Language Models become more ubiquitous across domains, it becomes important to examine their inherent limitations critically (hence my work on the AI Trust / Verisimilitude Paradox). 🤖 This work argues that hallucinations in language models are not just occasional errors but an inevitable feature of these systems. 🎭 The researchers demonstrate that hallucinations stem from the fundamental mathematical and logical structure of LLMs. It is, therefore, impossible to eliminate them through architectural improvements, dataset enhancements, or fact-checking mechanisms. 🧮 As I have said before - we are playing a game of error minimization, so we need to understand risk and risk mitigation. 🎯 There is still utility in LLMs, but they need to be handled and managed with care. ⚠️ We can save you time, money and help you safely navigate the ‘Age of AI’ #AI Risk Guy Digital Human Assistants Paul Edginton Ricky Sydney https://lnkd.in/gRPDET6x

LLMs Will Always Hallucinate, and We Need to Live With This

arxiv.org

2 Comments
Like Comment
To view or add a comment, sign in

31,693 followers

View Profile Follow

Graphcore’s Post

More Relevant Posts

Explore topics