There are many AI news digests around, but Papers of the Month from Graphcore Research gives the experts’ view on the most important advances in Artificial Intelligence. The latest edition goes big on LLMs including the existence of ‘super weights’ that prove crucial to language model performance. https://lnkd.in/gPjuEsam
Graphcore’s Post
More Relevant Posts
-
𝗬𝗼𝘂 𝗖𝗮𝗻’𝘁 𝗤𝘂𝗮𝗻𝘁𝗶𝘇𝗲 𝗡𝗲𝘂𝗿𝗮𝗹 𝗡𝗲𝘁𝘄𝗼𝗿𝗸𝘀 𝗙𝗼𝗿𝗲𝘃𝗲𝗿! 𝗨𝗻𝗱𝗲𝗿𝘀𝘁𝗮𝗻𝗱𝗶𝗻𝗴 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 𝗟𝗮𝘄𝘀 𝗶𝗻 𝗔𝗜 🤔 These days its a hot topic in discussions on Scaling Laws, and now a paper has come out attempting to present Scaling Laws for different levels of arithmetic precision. We can manipulate the precision of neural networks in two ways: either by post-training quantization or by training the network from the beginning with low precision, like 8-bit or 16-bit. We do this to create a smaller network for inference. Now, there is a practical research and papers that makes two significant claims. First, simply increasing the training dataset size doesn’t necessarily mean you’ll end up with a better model after quantization. That’s why Llama 3, with its over 400 billion parameters, is hard to quantize effectively. And second finding is that even training models with lower precision isn’t cost-free. For example, a model with 1 billion parameters trained in FP4 has comparable performance to a model with 250 million parameters in BF16. https://lnkd.in/dv7s3ieu
Scaling Laws for Precision
arxiv.org
To view or add a comment, sign in
-
𝐋𝐚𝐫𝐠𝐞 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬: 𝐀𝐩𝐩𝐥𝐢𝐜𝐚𝐭𝐢𝐨𝐧 𝐭𝐡𝐫𝐨𝐮𝐠𝐡 𝐏𝐫𝐨𝐝𝐮𝐜𝐭𝐢𝐨𝐧 This course is aimed at developers, data scientists, and engineers looking to build LLM-centric applications with the latest and most popular frameworks. By the end of this course, you will have built an end-to-end LLM workflow that is ready for production! 𝐓𝐡𝐢𝐬 𝐜𝐨𝐮𝐫𝐬𝐞 𝐜𝐨𝐯𝐞𝐫𝐬 LLM 00 - Introduction LLM 01 - Applications with LLMs LLM 02 - Embeddings, Vector Databases, and Search LLM 03 - Multi-stage Reasoning LLM 04 - Fine-tuning and Evaluating LLMs LLM 05 - Society and LLMs LLM 06 - LLMOps Playlist details in the comments. #llms #generativeai #ai #nlproc #deeplearning #onlinecourse #aibeginners
To view or add a comment, sign in
-
#TheSequence published A Summary Of Our Series About LLM Reasoning https://lnkd.in/eHbErwRU #artificialintelligence #generativeai #LLMs
Edge 379: A Summary Of Our Series About LLM Reasoning
thesequence.substack.com
To view or add a comment, sign in
-
The recent release of OpenAI o1 has brought great attention to large reasoning models (LRMs), and is inspiring new models aimed at solving complex problems classic language models often struggle with. Building on the success of o1 and the concept of LRMs, researchers at Alibaba have introduced Marco-o1, which enhances reasoning capabilities and tackles problems with open-ended solutions where clear standards and quantifiable rewards are absent. OpenAI o1 uses “inference-time scaling” to improve the model’s reasoning ability by giving it “time to think.” Basically, the model uses more compute cycles during inference to generate more tokens and review its responses, which improves its performance on tasks that require reasoning. o1 is renowned for its impressive reasoning capabilities, especially in tasks with standard answers such as mathematics, physics and coding.
Alibaba researchers unveil Marco-o1, an LLM with advanced reasoning capabilities
https://venturebeat.com
To view or add a comment, sign in
-
Great paper discussing Questionable Practices in ML research. However, I think it is important to note that this is true of #ML in industry and products as well. Which is why good experiment tacking and monitoring practices is importan Thanks to @Daniel A. for his post (https://lnkd.in/eQXDsCu8) paper: https://lnkd.in/eNpVEtzk
2407.12220
arxiv.org
To view or add a comment, sign in
-
Yes, AI can do many things, but NOT everything. Can AI make love? Or, is there any of the artificial Intelligence (AI) products you know can answer the following questions? "Who, in England of UK, has new US patents granted on the nearest Tuesday, when the USPTO releases the newly granted US patents on a weekly basis?" "Who, in the "江蘇‘’ province of China, has new US patents granted on the nearest Tuesday, when the USPTO releases the newly granted US patents on a weekly basis?" With our intellectual property (IP), we can. Do you or any of your contacts need our expertise and our IP? We're selling, NOT just talking. Thanks. Here is our pitch deck: "Experiment results showed that with our intellectual property (IP), a copyrighted multilingual metadata, we are doing what AI, like ChatGPT, can't do in data analytics. Without metadata, NO data can be found/retrieved, even by the most advanced technologies, like AI, supercomputers, etc. https://lnkd.in/g-aJFnXR Our IP can also make your information service UNIQUE in the world." P.S. We wrote this post ourselves, without using any AI tool.
"RAFT: A new way to teach LLMs to be better at RAG" New research from UC Berkeley researchers: https://lnkd.in/gN2wgKAn A TLDR from Cédric Vidal: - RAFT teaches the model how to distinguish useful from irrelevant documents for a given question - RAFT teaches the model how to properly answer a question, which documents are interesting to answer as well as the chain of thoughts that leads to the answer including snippets from the documents used for reasoning - You don’t have to fine tune again every time you add a document to your knowledge base / vector db, it should generalize as long as the domain is the same (legal, medical or something else) - It doesn’t change anything to the retrieval part, you still use semantic vector search or hybrid search. And it doesn’t change anything to the embedding model
RAFT: A new way to teach LLMs to be better at RAG
techcommunity.microsoft.com
To view or add a comment, sign in
-
NuMind (YC S22)'s first ML paper is out! 🎉 "NuNER: Entity Recognition Encoder Pre-training via LLM-Annotated Data" https://lnkd.in/dnbNfEbf, in collaboration with Univ Paris Diderot. Congrats Sergei Bogdanov 💫, Alexandre Constantin, Timothée Bernard, and Benoit Crabbé! This research is a continuation of an R&D project we started last year https://lnkd.in/ePAAiuXY, in which we created a NER-specific foundation model using GPT-3.5 annotated data. Given the novelty of the approach and the impressive performance we were getting, we decided to turn it into actual academic research... which was not the easiest endeavor... 😬 (I had forgotten how draining academic research could be!) but we now have a better understanding of why our model works, which will be key to creating next versions (all available here: https://lnkd.in/drx4UA3n). As an eye-candy, here is an updated "concept map" that was obtained by our concept encoder during the training of NuNER:
To view or add a comment, sign in
-
OpenAI´s GPT-4, and supposedly other top-notch LLMs appear to be less powerful as promised. A study published in the Artificial Intelligence and Law journal highlights numerous methodological flaws that could compromise the integrity of the reported scores. #openai #llms #llm #gpt4 #gpt4o #artificialintelligence #ai
Study Finds GPT-4's Bar Exam Scores Overinflated and Methodologically Flawed - WinBuzzer
https://winbuzzer.com
To view or add a comment, sign in
-
The Sequence is a reliable source of technology information on innovations in Generative AI. In this post, they provide a summary of the 13 recent issues on LLM reasoning. Really nice reference resource. https://lnkd.in/gRHriPQx
Edge 379: A Summary Of Our Series About LLM Reasoning
thesequence.substack.com
To view or add a comment, sign in
-
📚 "LLMs Will Always Hallucinate, and We Need to Live With This" by Sourav Banerjee and the team. 🧠 This is a foundational paper - if you're an AI / LLM practitioner or champion (or naysayer), this is worth reading - many of you will know this but the evidence is vital. 🔍 Summary: As Large Language Models become more ubiquitous across domains, it becomes important to examine their inherent limitations critically (hence my work on the AI Trust / Verisimilitude Paradox). 🤖 This work argues that hallucinations in language models are not just occasional errors but an inevitable feature of these systems. 🎭 The researchers demonstrate that hallucinations stem from the fundamental mathematical and logical structure of LLMs. It is, therefore, impossible to eliminate them through architectural improvements, dataset enhancements, or fact-checking mechanisms. 🧮 As I have said before - we are playing a game of error minimization, so we need to understand risk and risk mitigation. 🎯 There is still utility in LLMs, but they need to be handled and managed with care. ⚠️ We can save you time, money and help you safely navigate the ‘Age of AI’ #AI Risk Guy Digital Human Assistants Paul Edginton Ricky Sydney https://lnkd.in/gRPDET6x
LLMs Will Always Hallucinate, and We Need to Live With This
arxiv.org
To view or add a comment, sign in
31,693 followers