April 21, 2023

  1. ✨Google’s new MAGI trick

  2. 🖇️Image models: MiniGPT & LLaVA

  3. 🧏‍♂️ 1 man, 3 countries, and a $1B company

Two Headlines

Two main stories of last week. If you have only ~2 minutes 10 seconds to spare

1/ ✨Google’s new MAGI trick

Google is cooking up a fresh project called Magi.

Could India's beloved Maggi noodles inspire it? We'll leave that to your imagination.

This project aims to be a full-fledged app, unlike Google's "experimental" BARD. The New York Times suggests we can expect it to launch in May. So, what makes Magi stand out?

1/ It uses Google Earth's mapping tech

2/ GIFI, an AI image generator, is part of the deal.

3/ It connects to payment systems.

Since November 2022, Google's been feeling the heat from ChatGPT, which led them to kick things up a notch.

Now they have spring rooms and spaces to test Magi's various settings in engineering, design, and other departments.

Many think Magi may have been a planned project that shifted gears.

Bing is the gear stick here.

Samsung announced that it was slowly changing its default search engine to Bing. And Apple is in talks to replace Google as well.

Google pays $15B to Apple annually to keep its search engine by default.

To tackle their rivals, Magi aims to:

1. Be an improved version of BARD

2. Outperform BING.

3. Utilize Google's search data effectively.

If Magi can answer at least two of these questions, it might be more successful than ChatGPT.

However, not everyone on Twitter is convinced.

Some argue that Google has made big promises before but only delivered BARD - which remains experimental.

On the other hand, some optimistic folks believe Magi could change the SEO game by replacing the traditional 10-result format with a better follow-up message function.

Marketers are already prepping SEO hacks for this yet-to-be-released technology. We wonder if Google will survive the challenges every week, only to face new problems the next day.

Yesterday, Sundar Pichai announced that Google's Brain and Deepmind AI teams are Google DeepMind now.

It's highly likely they'll work on Project Magi. This collaboration could be a game-changer for Google.

2/ 🖇️ Image models: MiniGPT & LLaVA

OpenAI's GPT-4 announcement had everyone buzzing about text-to-image models. But before we could even catch our breath, MiniGPT swooped in and grabbed the spotlight.

There are two components of MiniGPT.

BLIP2 and Vicuna

BLIP 2 is a multimodal pre-training method.

And Vicuna runs in the back. It helps you run it on your computer.

But it also means it’s for open-source use only.

People want AI that's easy to use and set up. MiniGPT delivers just that, making it perfect for research.

As experts in AI say, the stronger the research, the better the implementations.

Speaking of bigger, they're already working on a 7B model.

Now let's talk about LLaVA

LLaVA is a language and vision assistant with better OCR skills.

If you’ve seen data entry jobs, you get paid to record sponsor timings in videos by analyzing each frame.

When LLaVA becomes sufficient, it will replace those jobs completely.

For a first attempt with language-only GPT-4, it's pretty impressive. And the best part? It learns without human input!

People are loving these alternatives to GPT-4. It just shows that innovation doesn't always come from the expected source.

And what's up with the names?

One Trend

1 trend you can pounce on. Reading time: ~1 minute 10 seconds

🧏‍♂️1 man, 3 countries, and a $1B company

A boy born in Jordan in 1983 moved to Bangladesh and later to the UK with his family.

That boy grew up and became a hedge fund manager.

Today, he's the CEO of a top Image-Text Model company.

It is the story of Emad Mostaque, founder of Stable Diffusion.

Emad started Stable Diffusion in 2020. He had a goal of creating realistic and creative images.

They had:

  • 2 employees

  • Generated 10,000 images/per year

  • Worked with 1,000 users

  • Made 0$ in revenue

Fast forward to today, the company now has a valuation of $1 billion.

Big investors, including some from the Y-Combinator board, are showing interest.

This week, they released their new StableLM language model.

It may seem like a step back, going from text-to-video to a more text-focused model.

But let me explain why it's a big step forward.

About 70% of AI companies start with building text models before moving to multimodal models.

Why is Emad doing it?

Emad aims to develop a large model, testing it with 3 trillion tokens.

This gives them a higher chance of making progress than a company that started with a text-only model.

A report from PWC shows that companies using multiple AI models have a 30% higher success rate than those relying on a single model.

The complexity of image and video data for LLMs provides better data than text models.

In the end, it's all about collaboration and growth.

Emad Mostaque's story demonstrates the power of innovative thinking and collaboration.

They have achieved impressive growth and attracted significant investor interest.

Everyone is saying to make AI accessible and open-source.

If this makes you love open-source even more, check out this petition.

