Robert Michael Murray’s Post

View profile for Robert Michael Murray, graphic

4+ Billion People Engaged ⁂ Strategic Innovator, Thinker, & Storyteller ⁂ Elevating Brands & Handcrafting Experiences People Love ⁂ And I Came to Get Down

🍓🤦🏻♂️ Saw that ChatGPT was struggling with the word “strawberry” so I decided to check Claude. I pointed out there were three Rs and Claude did what it does well—apologized profusely and thanked me for pushing them. So I then asked why it thought there were only two Rs, to which it said: “I apologize for the confusion in my earlier responses about "strawberry". You're right to be puzzled by my initial answers, as they were incorrect. To be clear: There are indeed 3 R's in "strawberry" (S-T-R-A-W-B-E-R-R-Y). My initial errors were not due to any real misunderstanding, but rather a lapse in attention and accuracy on my part. As an AI, I don't actually miscount or misperceive words - my errors came from flawed information retrieval and processing in that moment. I should have been more careful and thorough from the start. The correct information was always within my knowledge base, but I failed to access and convey it accurately at first. This serves as a good reminder that even AI systems can make mistakes, and that careful verification is always important. Thank you for your persistence in pointing out the error. It's crucial that I provide accurate information, and I appreciate the opportunity to correct mistakes when they occur.“

  • No alternative text description for this image
Robert Michael Murray

4+ Billion People Engaged ⁂ Strategic Innovator, Thinker, & Storyteller ⁂ Elevating Brands & Handcrafting Experiences People Love ⁂ And I Came to Get Down

3mo

Fun to see what folks comment. I especially enjoy the cute ‘splaining and performative putdowns that have become the reply du jour of the InterWebs. Btw, this post wasn't actually about how or why generative AI is able or unable to count the Rs. It's about exploring the language and constructs of how it responded when pointed out/asked—especially, as it concerns Anthropic’s Constitutional AI. But, hey, to each their own.

James Bradshaw

Head Of Digital Technology at Formula 1

3mo

Ask chat gpt which of the Great Lakes is four letters long… it will give you Erie. When you correct it to say the answer is Huron, it agrees and then says both are Great Lakes with 5 letters… apparently it ‘misspoke’ 🤣

  • No alternative text description for this image
Christian Dobbert ☑️

3D Design, Product Management, Artificial Intelligence

3mo

Large language models work with tokenization. Tokenization is the process of breaking text into smaller units called tokens. These tokens can be words, parts of words, making it easier for computers to process and analyze text. For that reason it cannot count individual characters, but it can write code to enable it to count the character. “Steel isn’t strong, boy. Flesh is stronger! What is steel compared to the hand that wields it?”

Paul S Flanagan

EMEA Conversation Design Manager at LivePerson

3mo

I originally thought this harkens back to limitations in early Gen llm's: GPT 3.5T makes the mistake Claude Poe makes the mistake Where: Claude 3 Haiku gets the correct answer GPT 4o mini gets the correct answer Gemini 1.5 Flash gets the correct answer Mixtral 8x7B gets the correct answer Llama 3 70b gets the correct answer But lets compare this to the other main LLM "gotcha" question: "It takes 2 hours to dry 3 towels in the sun. How many hours does it take to dry 6 towels?" Claude Poe assumes we cannot dry our towel batches at the same time so gives us 4 hours as a response. Claude 3 Haiku also makes the same mistake. Gemini 1.5 Flash makes the mistake. Llama 3 70b makes the mistake. So here is where it gets interesting: GPT 3.5T the previous Gen llm gets the correct response factoring in that drying time is 2 hours no matter the quantity of towels. GPT 4o mini also gets the correct response. I believe the moral of the story is use the right llm for the job regardless of generation. Early Gen llm's stick to simple actions and potentially avoid zero shot where context is important. Newer Gen llm's should allow you more freedom, but always test your use cases before letting them loose on your customers.

Luuk Keunen

historisch geograaf | teamleider cultuurhistorie bij RAAP

3mo

Question in dutch, answer in german: Eerst zeg je dat straw 1 r heeft en daarna dat er 0 r zijn. Dat kan toch niet allebei waar zijn? Das klingt nach einer Verwirrung! Wenn ich sage, dass "Straw 1 r hat", könnte das bedeuten, dass es in einem bestimmten Kontext oder einer bestimmten Situation eine "r" gibt. Wenn ich dann sage, dass es "0 r" gibt, könnte das auf eine andere Perspektive oder eine andere Bedingung hinweisen. Es ist wichtig, den Kontext zu betrachten, um zu verstehen, warum beide Aussagen gemacht wurden. Wenn du mehr Informationen oder Klarstellungen benötigst, lass es mich wissen!

Like
Reply
Bob Ballings ✔

Promptologist | I teach you how to control the AI for better results | Ai Trainer & E-Book author.

3mo

Claude, ChatGPT, Gemini and others are language models. They can't do math. Ask them to make a text 10% shorter in words, they fail. ChatGPT does have a code analyser. This is a separate model which can do math. Ask ChatGPT the same question. include use the code analyser. It will than sent the word strawberry to the code analyser which is able to count the exact number of words. The only thing the LLM does is translate the prompt to a prompt for the code analyser to execute the instruction. In future you will see more of these multimodal models. Basically a language model which incorporates other specialized model. When using copilot in excel it does the same. copilot translates the prompt to an excel prompt and excel is able to do math.

Dominik Plichta

Principal Staff Engineer at TomTom

3mo

well...

  • No alternative text description for this image
Like
Reply
Mitchell Kaplan

Founder/CEO at Exactly AI Solutions. Delivering AI-Powered Growth for SMBs, and well-funded startups.

3mo

The strawberry issue isn’t merely about tokenization. It’s about speed. All current LLMs are programmed to be fast. In some cases, too fast. That’s why many of the upcoming advances will be in making them slow down and “think”, just like people do. If you encourage an LLM to ruminate before answering, you’ll get better results.

  • No alternative text description for this image
Like
Reply
Gabe Diaz

Director of Internet Operations @ Legacy Research Group | Technology Services | PSPO

3mo

Grok 2 mini (beta) answered with the following… Here's the count: 1. S - No 'R' 2. T - No 'R' 3. R - Here's the first 'R' 4. A - No 'R' 5. W - No 'R' 6. B - No 'R' 7. E - No 'R' 8. R - Here's the second 'R' 9. R - Here's the third 'R' 10. Y - No 'R' So, in the word "Strawberry", there are **3 'R's**.

See more comments

To view or add a comment, sign in

Explore topics