LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.
Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.
Our mission is to advance humanity's understanding of AI by examining the inner workings of advanced AI models (or “AI Interpretability”). As a research-driven product organization, we bridge the gap between theoretical science and practical applications of interpretability.
We're building critical infrastructure that empowers developers to understand, edit, and debug AI models at scale, ensuring the creation of safer and more reliable systems.
Goodfire is a public benefit corporation headquartered in San Francisco.
I’m incredibly excited to announce Goodfire Ember — the first hosted mechanistic interpretability API, with inference support for generative models like Llama 3.3 70B. This makes large-scale interpretability work accessible to the broader community and is already being used by partners like Rakuten, Haize Labs, and Apollo Research to improve model performance, increase security, and extract new understanding from models.
We think this is the start of building a set of tools to accelerate alignment research, as well as unlocking a new development paradigm that harnesses the latent intelligence already present inside models.
Try it yourself: https://lnkd.in/eVc34XD2.
Read more about our launch: https://lnkd.in/erNbcTmG
X thread: https://lnkd.in/ecPu5Mrt
If you think aligning AGI is the most important problem in the world, we’re hiring at https://lnkd.in/gapupaYQ.
AI is hard to control and engineer. I wrote a post about how using feature steering, an interpretability technique, can change this dynamic.
https://lnkd.in/grUp6szS
If you're interested in what we're building Goodfire, we're hiring! https://lnkd.in/gapupaYQ
Incredibly excited to announce our research preview, which is now live. You can access it at https://lnkd.in/eDUXHftC, and read more about it on our blog (https://lnkd.in/eAK5VNEC).
In this preview, we've created a desktop interface that helps you understand and control Llama 3's behavior. You can see Llama 3's internal features (the internal building blocks of its responses) and precisely adjust these features to create new Llama variants.
Check it out, and let us know what you think! Shoutout to the team at Goodfire - Myra Deng, Daniel Balsam, and Tom McGrath for the incredible work.
Y'arr mateys! In honor of International Talk Like a Pirate Day, we're releasing an on-theme sneak peak of our research preview. We show that "feature" steering may enable more persistent and robust modifications to a language model's behavior compared to traditional inference-time techniques like prompting.
Basically, the model can't stop talking like a pirate.
Shoutout to Myra Deng for the great video and the rest of the Goodfire team for the awesome work, and sign up for our waitlist here - https://lnkd.in/ezFJ4hKS.
We're releasing our research preview soon - here's a sneak peak of what to expect!
Follow us on X (https://x.com/GoodfireAI) or sign up for our waitlist (https://goodfire.ai/) to stay up to date.
How sure are you that you can tell when social media accounts are bots? What about as AI improves?
I've been slow to share about this on LinkedIn, but Nicholas Thompson's post is a nice occasion:
Introducing "personhood credentials"
In a new paper—co-authored with researchers from ~20 orgs, & my OpenAI teammates Zoë Hitzig and David Schnurr— we ask, "What are AI-proof ways to tell who’s real online?"
As AI becomes more realistic, photos and even videos of someone might not be enough to trust that they aren't just a fake account trying to scam you.
Current solutions won’t be enough: We can’t rely anymore on AI lacking certain abilities, like typing in the letters of a CAPTCHA puzzle.
What we want is a way to access AI's transformative benefits - like helping to regenerate a person’s lost voice - without these abilities being leveraged for deception at scale. Further, people shouldn't have to give up privacy or inclusivity in the process.
To that end, we propose personhood credentials: a privacy-preserving tool that shows you’re a person, but doesn’t reveal which.
Importantly, these are backed by two things AI can’t fake, no matter how good it gets: passing in the real-world, and secure cryptography.
Personhood credentials can be issued by a range of trusted entities, like governments or foundations; you enroll by showing you’re a real person who hasn’t yet gotten one. Then, you can validate this with websites without revealing your identity.
The core requirements are that these credentials must be limited (so people can’t get many and give them to AI) and highly private—ensuring anonymity and unlinkable activity, even if websites or issuers collude. People and sites then have an optional tool to show there’s a real person behind an account, without showing anything more.
In the paper, we discuss a number of factors that must be carefully managed to maximize the benefits of these systems—like equitable access and checks on power—as well as a range of recommendations for preparing the Internet for AI's impacts and for making personhood credentials a viable option.
I'll include the paper below; would be grateful for any feedback!
The most interesting thing in tech: a smart idea for proving that you are real in an age of AI: Personhood credentials. There's a smart way, using cryptography, to verify that you are a real human without you having to reveal personal information that you don't want to reveal. Like PGP, it's based on a system of public and private keys and it's the best idea I've heard yet for solving a problem that is becoming ever more important.
founder & AI R&D @Waymark, Host of The Cognitive Revolution podcast, AI Advisor
Most mechanistic interpretability work has been motivated by AI safety concerns, but could interpretability also drive scientific discovery?
Goodfire's CTO Daniel Balsam & CSO Tom McGrath think so! 🔥
"We've got scientific foundation models like AlphaFold, ESM3, models of quantum chemistry, weather prediction – it's a huge list and it's growing all the time.
When models are better at scientific predictions than our best theories, they probably know something we don't.
If we're capable of explaining models, anything you can train a model to do, you can explain. So I think this has the potential to be a core transformative scientific technology, to go into these scientific models and pull out the new science and then bring it back to the rest of the world."
(Link to full episode in comments👇)
Excited to announce that we have raised a $7M seed round led by Lightspeed, with participation from Menlo Ventures, Work-Bench, South Park CommonsJuniper Ventures, Mythos Ventures, and Bluebirds Capital.
At Goodfire, we're building tools to demystify generative AI models. Our product incorporates interpretability-based features that allow developers to gain deeper insights into their models' internal decision-making processes and precisely control and steer model behavior.
Our team brings together experts in AI products, interpretability and startup scaling, led by:
Eric Ho, CEO (previously cofounded RippleMatch, a series B AI recruiting startup)
Tom McGrath, Chief Scientist (previously cofounded the interpretability team at Google DeepMind)
Daniel Balsam, CTO (previously Head of AI Eng at RippleMatch)
We're actively hiring mission-driven individuals passionate about making AI safer and more reliable. Join us in shaping the future of AI interpretability!
Read more about us:
VentureBeat: https://lnkd.in/gKNBQins
Press release: https://lnkd.in/g4Snqmcm