😺 How to pick the best AI model for what you ACTUALLY need...

PLUS: AI's are doing AI research now?!

Welcome, humans.

People are getting early access to Google Search’s AI Mode, and it’s really interesting to watch in action.

Some are saying this is basically Google’s Perplexity killer. And if you add this to the success of Gemini 2.5 Pro, which Google is giving away for free rn, it looks like Google is finally becoming the threat OpenAI was created to prevent…

Here’s what you need to know about AI today:

We break down how to pick the best AI model.
OpenAI launched PaperBench to test AI research replication.
Google released AGI Safety report predicting AGI by 2030.
Wikimedia traffic rose 50% since January from AI scraping.

Advertise in The Neuron here.

Here's how to pick the best AI model for what you actually need

Tired of playing AI model musical chairs? One week Claude's the best, then it's ChatGPT, then suddenly Gemini's crushing benchmarks (welcome to the wild world of AI, folks—or as we call it, “Tuesday”).

With all the constant changes, how do you know which AI to use when? We actually just watched Tina Huang's hour long interview with Louie Peters (CEO of Towards AI) where they tackled this exact question, and the advice was pretty solid.

First things first—it all depends on what you're trying to accomplish:

Solving complex reasoning problems that need high accuracy?
Processing massive documents with over 700K words?
Just chatting casually and need something fast and cheap?
Building enterprise solutions that need self-hosting?

These all require different AI strengths. Here's a few of Louie's pro tips on how to pick the right model for the job:

Match functionality to your needs: Choose models with capabilities (images, audio, etc.) that fit your specific tasks—for beginners, start with ChatGPT 4o.
Check context window size: For long documents or complex instructions, models like Gemini 2.5 Pro offer up to 1M tokens.
Use benchmarks wisely: Check the metrics most relevant to your use case (math, coding, writing)—more on that below.
Calculate your ROI: Expensive models (like o1 Pro) are worth it only when reliability saves more time than they cost.
Experiment regularly: Build your own intuition by testing multiple models—Louie personally uses 5-6 different models 20-30 times a day.

Louie also shared his own breakdown of which models are his favorite atm:

So what tools can help you actually implement this advice?

You could test every model individually (time-consuming but thorough)—or you can use OpenRouter, which lets you test multiple models with the same prompt at once. Just sign up, add funds for premium models, and start comparing results side-by-side.

Another option is checking benchmarks like Live Bench, but remember that AI companies know how to game these tests.

Our favorite approach? Use the site Artificial Analysis, which puts every AI model through standardized tests covering intelligence, speed, cost, and specialized skills.

Their latest rankings show:

Overall Intelligence: Gemini 2.5 Pro Experimental.
Speed Champion: Nova Micro (322 tokens per second).
Cost-Effective King: Gemini 2.0 Flash ($0.2 per million tokens).
Coding All-Star: o3-mini (high).
Math Reasoning: Gemini 2.5 Pro Experimental (94/100).
Best Open-Weight Model: DeepSeek R1.

Fun fact: they also rank speech to text, image, and video models, too!

Now, the above ranking could change. Like, tomorrow. So ultimately, we recommend you go with whichever one consistently works the best for you.

You don’t always need the smartest model—you just need the one that gets the job done.

After all, choosing an AI is surprisingly personal—it’s not unlike choosing your friends (or more appropriately, your cybernetic coworker). After all, if you're going to spend a good chunk of your day “chatting” with something, the vibes do kinda matter.

FROM OUR PARTNERS

When your AI needs the best ears in the business... 👂

Frustrated when voice AI constantly misunderstands you? Speechmatics fixed that.

While others rush to make AI talk, Speechmatics has solved what matters first: making it truly listen.

Their real-time speech tech delivers 90%+ accuracy in under one second across 55+ languages, diverse accents, and dialects – a full 25% more accurate than competitors, even in noisy environments.

Whether it's AI assistants, customer service, or medical transcription, Speechmatics ensures AI catches every word the first time.

No more “can you repeat that?”—just AI that keeps up, not catches up.

Try Speechmatics at no cost and hear the difference.

Prompt Tip of the Day

When you’re trying to condense something, try this prompt: “First, give me a shortened version, in <short version>, keeping all the same specificity and context of the original. When that’s done, write an even shorter version, in <even shorter>.”

It’s sort of like adding a built-in editor for your AI writing (demo).

Another helpful tip? If you want the AI to write more visually, try: “make it more concrete (show, don’t tell).” Also, you can ask it to use more “image words”—but you might want to add something like: “Don't use metaphors, just use picture words that the user can see.” (demo…maybe I probably should’ve used that version, huh?).

Treats To Try.

Claude for Education is a new resource for schools that helps you enhance teaching and learning with specialized features for Claude like Learning mode that guides student reasoning rather than giving answers outright (more).
Actively AI researches, understands, and reasons about potential customers to maximize revenue quality and pipeline growth (raised $22M).
GenSpark is a new agent out of China that completes tasks for you through a mixture-of-agents system with fewer hallucinations than competitors—demo.
DeepSite is a totally free vibe-coding app you can use to help code a website (powered by DeepSeek)—we used it to make this.
Subscription Day tracks all your subscription payments in your menu bar, showing upcoming charges on a calendar and alerting you before payments are due (Mac only rn).
Recall connects what you're currently reading with content you've previously saved, instantly showing you where you've seen similar information before.
ElevenLabs now has a text to bark model for dogs… where’s the cat one, huh??

See our top 51 AI Tools for Business here!

Around the Horn.

Google replaced the current leader of its consumer AI apps with the leader of Google Labs and helped launch the viral AI research tool NotebookLM.
OpenAI released PaperBench, a benchmark that evaluates AI agents' ability to replicate state-of-the-art AI research papers—so far, the best agent tested only achieved 21% replication accuracy.
Google published a 145 page report on the company’s approach to “AGI Safety” and predicts AGI could arrive by 2030.
Wikimedia traffic surged 50% since January 2024 due to AI crawlers scraping content.
Researchers from Hong Kong introduced Dream 7B, “the most powerful” open diffusion model (which means it generates text sorta like painting).

FROM OUR PARTNERS

On-device AI. No cloud. No GPUs.

Mirai is building the infrastructure for on-device AI, enabling dev teams to run small language models directly on iOS.

Locally. Fast. Private.

Their engine supports a wide range of architectures, including Llama, Gemma, Qwen, VLMs, and RL over LLMs—making advanced AI capabilities accessible on mobile devices. Yeah, pretty cool.

Ok, let’s try

Thursday Trivia

One is real, and one is AI. Which is which? (vote below!)

Which is AI?

The answer is below, but place your vote to see how your guess compares to everyone else (no cheating now!)

Here are the results from last week’s trivia (A was AI):

Here’s what you said:

L.G. chose A: “A. is AI - it's almost perfect - but the logo isn't 100% on point.”
T.R. chose B: “The gibberish letters on the cap lead me to think B is AI-generated, given its historical difficulty with text in images.”
D.F chose A: “Shallow depth of field gives it away... It's very common in AI image generation.”