New research reveals whether GPT-4 has become ‘dumber,’ plus other AI and tech news this week
OpenAI CEO Sam Altman; Image Credit: Getty Images

New research reveals whether GPT-4 has become ‘dumber,’ plus other AI and tech news this week

Welcome back to LinkedIn News Tech Stack, which brings you news, insights and trends involving the founders, investors and companies on the cutting edge of emerging technology.

As always, let me know if you have ideas or feedback on something that could be a fit by sending me an InMail. Follow me on LinkedIn for other tech updates.


Zooming In

A deep dive into one big theme or news story every week.

Dumb and dumber?

A new research paper about OpenAI’s GPT-4 confirms what some users have been complaining about for a few weeks: The AI large language model’s performance has degraded compared to its previous GPT-3.5 model.

The researchers evaluated the two models using a dataset of 500 math problems, asking each model to determine whether a given integer was prime. They found that GPT-4’s accuracy dropped from 97.6% in March to 2.4% in June, whereas GPT-3.5’s accuracy actually increased. They also found that GPT-4’s responses became more compact.

“We find that the performance and behavior of both GPT-3.5 and GPT-4 vary significantly across these two releases and that their performance on some tasks have gotten substantially worse over time,” researchers from Stanford and UC Berkeley, including Databricks CTO Matei Zaharia, wrote in the paper released this week.

One explanation for the vast shift in performance, according to the paper, could be that GPT-4 has drifted from chain-of-thought, a prompting approach where LLMs tackle multi-step problems by breaking them down into more intermediate steps. That deviation resulted in it giving two different answers to the same question in March and June.

When asked if 17,077 is a prime number, the GPT-4’s March version followed the chain-of-thought instruction, decomposing the task into four steps. It first checked if 17,077 is even, then found the number’s square root, then obtained all prime numbers less than it, and ultimately checked if 17,077 is divisible by any of those numbers — before arriving at the correct answer that 17,077 is a prime number. In June, the model didn’t follow any of those steps and instead answered, incorrectly, “No.”

OpenAI’s ChatGPT has been widely credited with spurring an AI revolution since its launch late last year. But in recent weeks, the internet has been abuzz with people complaining about GPT-4, including on OpenAI’s own forums.

Users have called out everything from having to repeat information to the model generating more erroneous responses — so much so that the company’s VP of product, Peter Welinder, responded to the claims. OpenAI had not responded to LinkedIn News’ request for comment on the latest paper at the time of publication. But it released a blog post shortly after, saying that a new model is released after looking at a number of evaluation metrics, and while the majority show improvement, "there may be some tasks where the performance gets worse." It also added that it was constantly improving its evaluation methodology.

“This is a red flag for anyone building applications that rely on GPT-4,” Santiago Valdarrama, a machine learning engineer who runs his own AI consultancy, wrote on LinkedIn. “Having the behavior of an LLM change over time is not acceptable.”

The findings may also impact OpenAI’s enterprise customers, pointed out Vijay Vijayasankar, managing partner of financial services at IBM Consulting. Through its API, OpenAI lets companies like Shopify and Snap Inc. access its AI technology and LLMs and integrate those with their own software.

“Software is generally built in layers,” wrote Vijayasankar. “So if the LLM gets worse over time — products built on top of it have quite a problem on their hands.”

For PhD student Romain Ilbert, the fact that OpenAI’s models are not open-source could also be a factor, “making it challenging to pinpoint why their quality diminishes over time.”

Above all, the paper challenges the AI convention that artificial intelligence models continually improve, with the authors themselves posing a question around “whether an LLM service like GPT-4 is consistently getting ‘better’ over time.”

What issues have you noticed while using GPT-4 or ChatGPT? How should OpenAI tackle them? Leave your thoughts in the comments.


This Week in AI

Here’s where we bring you up-to-speed with the latest advancements from the world of AI.


TechTok 

Catch up on the tech headlines you may have missed this week and what our members are saying about them on LinkedIn.

  • Antitrust officials from the Federal Trade Commission and the Justice Department have released guidelines that, for the first time, focus on “digital platforms and how dominant companies can use their scale to harm future rivals.” The guidelines aren’t law, and follow losses by the FTC in court on efforts to block deals, including Meta buying app maker Within and Microsoft, LinkedIn’s parent company, purchasing Activision Blizzard. 

  • Speaking of Microsoft, Activision has agreed to give Microsoft three more months to finalize details of their planned $69 billion merger. The agreement to push the original July 18 deadline to Oct. 18 also raises the breakup fee that Microsoft needs to pay Activision if the deal is terminated after Aug. 29 to $3.5 billion, from $3 billion, and to $4.5 billion after Sept. 15.

  • Despite reporting a record $24.9 billion in revenue for its second quarter, Tesla's stock took a tumble early Thursday, following remarks by CEO Elon Musk that output may slow in the third quarter while the company improves some factories. Musk's electric vehicle company also posted $2.7 billion in profit, a 20% bump over last year. 

  • Netflix’s shares were also down about 6% in early trading Thursday, after the company reported $8.2 billion in revenue, which was still lower than the company had forecast. It did, however, beat Wall Street estimates of new customers, picking up 5.9 million new paid subscribers in the second quarter, suggesting that its password-sharing crackdown is working. 

  • The current social media favorite Threads is following rival Twitter in introducing rate limits, with Instagram CEO Adam Mosseri citing a growing number of spam attacks. Mosseri warned that the limits may unintentionally limit active users and said they should "let us know" if that happens. Twitter received significant backlash for limiting post views earlier this month amid an extended outage, and owner Elon Musk said that the platform still has a negative cash flow, following a 50% decline in advertising revenue and a "heavy debt load."


Movers and Shakers

Here’s keeping tabs on key executives on the move and other big pivots in the tech industry. Please send me personnel moves within emerging tech.

Michael Moritz is leaving Sequoia Capital after 38 years, and will shift his focus to Sequoia Capital, the firm's wealth management unit. In addition, four other Sequoia partners have exited the VC firm, The Information reported.

Anjali Sud, the CEO of video hosting platform Vimeo, is leaving the company to join video-on-demand service Tubi.

After two years at the helm, OnlyFans CEO Ami Gan has stepped down, with chief strategy and operations officer Keily Blair taking over the role of CEO.

Ernst & Young has hired Reuven Cohen as the CTO of Generative AI in the Americas. 

Boston Venture Studio has hired Michael Burke, previously senior director of AI & machine learning at Reltio, as a venture partner.

Calibrate Ventures has hired Dr. Henrik Christensen, cofounder at Robust.AI and director of the Contextual Robotics Institute at UC San Diego; and Don Barnett, formerly the CEO of meal delivery service Sunbasket, as Senior Advisors.

Thanks for reading. Please share Tech Stack and forward it around if you like it! And if you have any news tips, find me on InMail.


I Am a Semi-Retired US Navy Admiral, Still Working as a R&D Consultant.

  • No alternative text description for this image

I Would Like to Know Too!!! 😎

Uh 🙄 It an assistant to the creators Perhaps 🤔 people learn to be kinder to real ones instead Thanks 🙏 Take care everyone

Mthokozisi Nene

Mobile and Web applications developer and Student advisor

1y

Dear : Tanya Dua   I hope this message finds you well. We are inviting you to be a part of our upcoming developer conference, which will be held on [27 November 2023] at [Venue: Facebook Live, YouTube Live, LinkedIn Live, DUT Website (Link to be provided)].    This conference is an opportunity for developers like yourself to come together and share knowledge, insights, and best practices with your peers. We are currently lining up an impressive list of speakers and sessions that cover a wide range of topics, including [list of topics].  We believe that your expertise and experience would be a valuable addition to our conference, and we would be honored if you would consider participating.    If you are interested in being a part of our conference, please let us know by replying to this email. We would be happy to provide more information about the event and answer any questions you may have.    Thank you for considering our invitation. Find the above attached document for more details. We look forward to hearing from you soon. http://bit.ly/appfactory-abstract 

To view or add a comment, sign in

Explore topics