Here’s how Sam underpromised and overdelivered on GPT-4

We discuss about Conformer-1, Alpaca and GPT-4

Barun Pandey & Barun Sharma
March 17, 2023

GM! Welcome to the Status Code.

We're like a pod of whales, communicating the most important AI news and trends to you!

Here’s what we have for today:

🎤Is Assembly AI’s new model the next big thing?
💣Alpaca’s Diffusion
🤹‍♂️GPT-4 and its quest for multimodality

(Estimated reading time: 3 minutes 30 seconds)

Not subscribed yet? Stay in the loop with AI weekly by reading the Status Code for five minutes.

Subscribe To The Status Code

Two Headlines

Two main stories of last week. If you have only ~2 minutes to spare

1/ 🎤 Is Assembly AI’s new model the next big thing?

AI progress is about moving needles in three core areas.

Models, Data, and Compute.

Until the early 2010s, we had the model. The University of Toronto researchers introduced CNNs (Convolutional Neural Networks) in the 1980s.

But we didn’t have appropriate data or computing power.

Then came NVIDIA with a programming language. It allowed us to use GPUs for general-purpose compute. Compute problem ✅

Then in 2009, researchers from Stanford released Imagenet - a collection of labeled images we could train CNNs on. Data problem ✅

So the field of Computer Vision fostered. But by 2013, NLP was still weak.

Because even though we had data (thanks to internet), and compute, models sucked for NLP.

They had short reference windows. And reference windows determine how much previous text you use in context when generating new text.

So when you’re writing a long essay on why inflation is bad, you want to make sure the sixth paragraph of your essay has a context of the first. But it was difficult to achieve that with models such as RNN, or LSTMs.

So, we needed a better model.

Enter Google Brain. And Transformers.

Their “Attention is all you need” paper changed the game. They used a thing called “attention mechanism” which gave it an infinite reference window.

So, a new AI startup, OpenAI, decided to go all-in. They believed Transformers completed the trifecta.

And it paid off. Generative Pre-Trained Transformer-2 (GPT-2), GPT-3, and now GPT-4. Many consider OpenAI to be the leader of the space.

And that brings us to today.

Compute is cheaper than ever.

There’s no shortage of training data (unless the Government acts).

What’ll be our bottleneck? A better model.

That’s why Assembly AI’s Conformer-1 is so promising.

It combines the best of two worlds. Transformers and CNNs (No, not the news channel!).

Transformers are good at capturing the interaction between all the data they process.

Convolutional Neural Networks are good at exploiting local features in the data.

So say you’re running your favorite cat video in AssemblyAI’s playground. CNN will analyze the patterns and mouth movements of the cat. And the Transformer will get pauses when the cat looks at insects.

But, it’s not a new idea.

Google had thought of it in 2020. They named it Conformer.

Now, in its rebirth, Assembly AI claims it’s the state of the art for Speech Recognition. Their release document says it’s 43% less erroneous than Whisper, and it’s 29% faster than other models too.

There’s one caveat though. It’s expensive as hell.

They have priced it at around ~1$ per hour. Microsoft’s ACS (Azure Cognitive Services) costs ~0.30 per audio hour.

Azure’s ACS also offers more support for languages. Maybe that’s where Assembly AI could start.

2/ 💣Alpaca’s Diffusion

Our folks at Stanford had an exciting announcement.

They released Alpaca, a cheaper and better alternative to OpenAI’s GPT-3 model.

They claim it’s light and efficient and you can run it on your own machine. It runs on 52K instructions(which is derived from 7B instructions from LLaMA). And each instruction is unique.

But, the key is how transparent Stanford was with Alpaca. They have released their demo, data, process, and their training code.

Anyway, you can run Alpaca on hardware costing less than $600. The aim then is to help the community study large language models.

Alpaca sermonizes this process:

Low cost => High Use=> High R&D => Better | Faster Models

Don’t we love this trend of making language models accessible?

One Trend

1 trend you can pounce on. Reading time: ~1 minute 30 seconds

🤹‍♂️GPT-4 and its quest for multimodality

A year ago, Sam Altman held a Q&A session.

It was as you expected. Normal questions about how AI is important and what we can expect.

But he also remarked about their direction in building GPT-4. He said GPT-4 would be a text model, not a multi-model.

Why? Because they wanted to perfect their text-based models and move on to multi ones.

Fast forward to today, OpenAI has released GPT-4.

And the big headline? It’s a multi-model. It accepts image and text inputs and emits text outputs.

Was Sam deliberate in misguiding everyone? Why the sudden change in direction? Well, nobody knows for certain. But perhaps they learned a lesson from other companies’ troubles.

A team at NVIDIA released the Megatron NLG Model in 2019 with huge hopes. It had 530B parameters (GPT-3 had 175 billion). But, they found that the model size is not everything.

There’s some rumor to support that theory. Some reports say GPT-4 could be smaller than GPT-3. If that’s true, they are moving in the direction of Meta (with LLaMA) and AI21 Labs (with Jurassic-2).

Anyway, the possibilities extend beyond text with GPT-4.

The first application OpenAI showcased for its new tech was a bot that could function as ‘eyes’ for the visually impaired.

Send it a picture, and it could answer any question about it. Voila!

There could be other open-ended use cases.

Like finding anomalies in the security feed.

Or maybe a bot where you send pictures of your broken car, and you get instructions on how to fix it?

So maybe Multimodality is a way.

We are not far from having a Jarvis-like assistant. Maybe patching up multiple models like Whisper, ChatGPT, and other text-to-speech models already achieves that. But it won’t dazzle you.

But, with an all-in-one model, it may. The spokes of the wheel have started to move. And, image is the next area of focus.

Images, videos, and text go hand-in-hand with the modern world.

And, I keep referring to Sequoia’s timeline for Generative AI. But, we’ve already made significant progress in text.

Now, it’s time for images and videos.

🦾What happened this week?

Microsoft cut off AI Ethics and Society Team
Anthropic introduced Claude, “a safer AI assistant”
A better, realistic version of Midjourney(v5) is here
Microsoft is adding GPT-4 to Office
Google is also bringing AI chat to Gmail and Docs
Fintech startup Stripe integrated GPT-4 into payment processing
Intercom announced Fin, GPT-4 powered customer service chatbot
Read this book from Reid Hoffman who wrote it using GPT-4
AssemblyAI released its speech recognition model: Conformer-1
Glaze is here to add a layer to protect your digital art
Applications are open for HF0, a 12-week residency for the world’s top tech founders
KEG group released ChatGLM, an open-source model trained on 1T Chinese and English tokens
🎁Bonus: Compare ChatGPT & ChatGLM -6B here!

💰Funding Roundup

AdeptAI raises $350M in series B funding
Stripe raises $6.5B in funding with OpenAI
Healthcare AI HealthPlix raised $22M in series C funding
Spanish startup BurgerIndex raised $1.2M in seed funding
French startup IktosAI raised €15.5 million in Series A funding
MLOps platform Seldon raised $20M in Series B funding
PrescientAI, an automation AI company raised $4.5M in seed funding
Fairmatic, an AI-powered insurance company raised $46M for auto insurance
Indent raised $8.1M in funding for an AI-powered customer review tool

🐤Tweet of the week

I’m still confused as to how a non-profit to which I donated ~$100M somehow became a $30B market cap for-profit. If this is legal, why doesn’t everyone do it?
— Elon Musk (@elonmusk)
4:49 PM • Mar 15, 2023