đŸ˜ș Apple says AI can’t reason

PLUS: Parents sued a school so their kid can use AI...
October 17, 2024
In Partnership with

Welcome, humans.

It looks like small businesses are getting deepfaked now: Meet "Ethos," Austin's fake #1 restaurant. All AI-generated
 even the croissant hippo:

The most criminal part about this is the fact that croissant Hippo isn’t real
 would 100% buy.

Reddit sleuths investigated and found that the only part of the website that works is the merch store. We’d buy a Hippo Croissant shirt
 accidental genius business idea?

Even though the restaurant has 72K followers and tons of engagement, Redditors are thinking those are probably AI, too.

Here’s what you need to know about AI today:

  • Apple researchers published a paper debunking AI’s ability to reason.
  • 40% of the $79B in “cloud” funding went to genAI startups.
  • Boston Dynamics and Toyota teamed up on robot AI.
  • Parents sued a school over AI cheating accusation.

Advertise in The Neuron here.

Apple researchers want you to know that language models can’t actually “reason.”

Apple's AI team just dropped a truth bomb on the world of large language models. In a new paper, they've shown that popular AI chatbots like ChatGPT aren't the math whizzes we thought they were.

By the contrary—they struggle with basic reasoning that most humans take for granted.

The researchers developed a new benchmark called GSM-Symbolic to put these models through their paces. Here's what they found:

  • Changing numerical values in questions caused significant performance drops across all models tested.
  • As questions got more intricate (adding more clauses), accuracy plummeted
and variance skyrocketed.
  • Adding a single irrelevant sentence to a math problem caused accuracy to nosedive by up to 65%.

To illustrate this, consider their "kiwi” problem:

When asked to count kiwis, many top models stumbled (including GPT and Claude), subtracting five “smaller” kiwis when their size shouldn't have mattered at all. It was a random detail that needed to be filtered out.

Therefore, the study concluded that current language models aren't truly reasoning. Instead, they’re performing “sophisticated pattern matching" that falls apart under scrutiny.” Duh, that’s why they’re called “predictive models”, Apple!

For his part, Ilya, formerly of OpenAI, says predicting the next word does lead to understanding. So could more predictions lead to more understanding, like reasoning? That’s the idea behind o1, anyway (full Ilya interview here btw, great listen).

Our take: If Apple doesn’t think AI can reason, why is it about to go all in on AI on its devices? Aha! HERE’s why


FROM OUR PARTNERS

Automate your sales follow-up emails with AI

Did you know you can automate your follow-up game and close more deals with Attention?

While you’re drowning in follow-ups and data entry, your competitors are leveraging Attention's AI to save a ton of time.

Attention lets you:

  • Send personalized follow-up emails within minutes of ending a call.
  • Get cross-call insights to understand why you're winning (or losing) deals.
  • Receive live coaching during calls to handle objections like a pro.

The result? Attention’s users are LOVING the time saved:

  • “Game-changer for our productivity. Thanks to Attention, our reps save 20-30 minutes per call with automated CRM data entry” - Yan Kessler, VP Sales at Aspire Technologies.

Book a demo here.

Around the Horn.

Good day for hippo content, TBH. Try it here on iOS + read more here.

  • Accel released a new report that found 40% of all VC funding for cloud startups (around $79B) went to those that focused on genAI.
  • LatticeFlow's new LLM Checker tool tested the top AI models for how well they comply with the EU AI Act—it found most did well overall, but some struggled with bias and cybersecurity issues.
  • Boston Dynamics and Toyota Research Institute announced a collaboration to develop AI for the electric Atlas humanoid robot.
  • Parents sued a school after their son was accused of cheating by using AI on his assignment. Their argument? There were no “established rules, policies or procedures” about how AI can or can’t be used, or how staff should handle it.

Treats To Try.

  1. *You need actionable, real-time insights to close more deals and boost CX. Gladia just made that dream come true with Gladia Real-Time. Specialized in state-of-the-art speech AI APIs, Gladia Real-Time helps you unlock instant insights from any voice conversation. Click here to try for free.
  2. Dropbox launched Dash for Business which lets you find and manage content across all your work apps, tabs, and files.
  3. Focus Buddy updates your to-do list in real-time, checks in to help you overcome procrastination, and provides weekly insights on your work habits.
  4. FAQ Widget creates FAQ popups for your website that answer visitor questions to help increase sales.
  5. Tattoon is an app that lets you demo what a tattoo would look like on your body to see if you like it.
  6. Check out this person who self-hosted Llama 3.2 on their home server, and their step by step guide for you to replicate.
  7. Mistral released a new model called Les Ministraux that’s built to run locally on phones and laptops—the 8B version is available to try today for research purposes, with a 128K context window.
  8. Just for fun: Political Debate Simulator lets you pick any model you want to have U.S. Presidential candidates Kamala Harris and Donald Trump debate each other on any topic—ask them to debate who ate the last slice of pizza!

See our top 51 AI Tools for Business here!

*This is sponsored content. Advertise in The Neuron here.

Is what you’re building future-proof? Find out at Spectra 2024 on October 23rd

Hear from edge computing leaders like TensorFlow pioneer Pete Warden, Edge Impulse co-founders Zach Shelby and Jan Jongboom, Qualcomm AI Hub head Siddhika Nevrekar, and Particle CEO Zach Supalla to gain insider insights into tomorrow’s smart tech and equip yourself with the necessary tools to create the next wave of intelligent products.

Register now – 100% free and virtual!

Thursday Trivia.

One is a real snapshot from the mall in 1989, and one is AI. Which is which?

A.

B.

Which is which?

The answer is below, but place your vote to see how your guess compares to everyone else (no cheating now!)

A.

B.

A Cat's Commentary.

cat carticature

See you cool cats on X!

Get your brand in front of 450,000+ professionals here
www.theneuron.ai/newsletter/apple-says-ai-cant-reason

Get the latest AI

email graphics

right in

email inbox graphics

Your Inbox

Join 450,000+ professionals from top companies like Disney, Apple and Tesla. 100% Free.