What the hell is happening at Meta?

+ DeepSeek's new paper

Barun Pandey
April 08, 2025

GM! Welcome to Get Into AI.

I’m your favorite barista. I serve you a cup of news just the way you like it.

Here’s what I have for today:

Meta's Llama 4 Launch: More Drama Than Performance
AI Agents Beating Humans at Phishing (Yikes!)
Self-Principled Critique Tuning: DeepSeek's New Approach

First, a word from our sponsors:

Start learning AI in 2025

Keeping up with AI is hard – we get it!

That’s why over 1M professionals read Superhuman AI to stay ahead.

Get daily AI news, tools, and tutorials
Learn new AI skills you can use at work in 3 mins a day
Become 10X more productive

Alright, let’s dive in!

Three major headlines

Three main stories for the day.

1/ Meta's Llama 4 Launch: More Drama Than Performance

Meta finally dropped Llama 4 models (Scout and Maverick) over the weekend, but the reception has been...lukewarm at best.

Despite claims of a massive 10M token context window and impressive benchmarks, users are finding the real-world performance disappointing, especially for coding tasks.

The plot thickened when rumors emerged that Meta might have "gamed the benchmarks" by including test data in training.

Their head of AI research also stepped down right before the launch, and anonymous posts claim company leadership pushed for blending test sets into training data to boost scores.

Meta has denied these allegations, but the timing is certainly suspicious.

The cherry on top?

The smallest model (Scout) requires 109B parameters, making it too large for consumer GPUs. Looks like Meta spent all those H100s to end up trailing behind DeepSeek, Qwen, and others.

Zuckerberg might want to reconsider his statement about AI replacing mid-level engineers—Llama 4 doesn't seem up to the job yet!

2/ AI Agents Now Better at Phishing Than Humans

Research from Hoxhunt shows that AI agents have officially surpassed human red teams at creating effective phishing campaigns.

Their AI phishing simulations are now 24% more effective than human-crafted ones.

This is a classic "good news, bad news" situation.

The good news: we can use these AI agents to build better defenses.

The bad news: well, the bad guys can use them too.

Time to be extra suspicious of that "urgent email from your CEO" asking for gift cards!

3/ DeepSeek's Self-Principled Critique Tuning

DeepSeek released a paper on "Self-Principled Critique Tuning" (SPCT), a new approach to improving the reasoning of large language models.

Instead of using human-crafted rewards, SPCT helps models develop their own reasoning principles and assess their outputs more structuredly.

This could be a breakthrough for getting LLMs to "think better" without needing constant human supervision.

If successful, we might see models that can check their own work and actually catch their mistakes. Imagine that, an AI that knows when it's wrong! My ex could never.