
GM! Welcome to Get Into AI.
I’m your AI news sherpa. Guiding you through the mountains of news to get to the good bit.
Here’s what I have for today:
Grok 3 & Mini Now API Available
Gemini 2.5 Pro & Flash Enter the Ring
Google’s QAT Makes Gemma 3 Models Skinnier
Alright, let’s dive in!
Three major headlines
Three main stories for the day.
It’s all OpenAI day today.
1/ Grok 3 & Mini Now API Available
X.AI has finally made Grok 3 available via API!
The real surprise is the addition of “baby brother” Grok 3 Mini, which at 50 cents per output mtok claims to compete with much larger frontier models.
Early discussions suggest it’s surprisingly competitive with Gemini 2.5 Pro, especially in tool use (though perhaps a bit trigger-happy with those tools).
With Grok 3 Mini’s output token price being 1/7th of Gemini 2.5 Flash, this could be a serious contender for developers on a budget.
API docs are at https://docs.x.ai/docs/overview if you want to jump in.

2/ Gemini 2.5 Pro & Flash Enter the Competition
Google has countered with its latest models: Gemini 2.5 Pro and Gemini 2.5 Flash.
The Flash variant uses a “hybrid reasoning” approach, adjusting how much “thinking” it does based on the prompt.
This flexibility seems to be working – it’s currently topping the Chatbot Arena leaderboard. Meanwhile, Android users are fuming about inconsistent rollout of Gemini Advanced features, with some planning full device wipes just to fix the UI.
In the wings, Google employees have confirmed rumors of a “Gemini Coder” model, while Gemini Ultra is expected to scale improvements over Pro (but potentially at higher cost).
So what? Google’s keeping pace in the race, but their rollout process might need a reasoning upgrade too.

3/ Google’s QAT Makes Gemma 3 Models Skinnier
Google released Quantization-Aware Training (QAT) optimized Gemma 3 models, dramatically slashing VRAM requirements.
The 27B parameter model drops from a chonky 54GB to just 14.1GB while maintaining quality.
This isn’t just post-training quantization – it’s models specifically trained to perform well at int4 precision.
The QAT models are available in multiple formats (MLX, llama.cpp, Ollama, LM Studio, Hugging Face), though some users report inconsistencies between platforms.
Still, this could be a game-changer for running powerful models on consumer hardware

Catch you tomorrow! ✌️
That’s it for this week, folks! If you want more, be sure to follow our Twitter (@BarunBuilds)
🤝 Share Get Into AI with your friends!
Did you like today's issue? |