Google is reclaiming its throne

And, more on Meta's ImageBind & 3D-AI space

Barun Sharma & Aasma Aryal
May 12, 2023

GM! Welcome to The Status Code.

We’re like rugged rangers, protecting precious AI content for you to explore.

Here’s what we have for today:

🛋The Metaverse in your living room
👑Google is reclaiming its throne
🧊 A peek into 3D-AI space

(Estimated reading time: 4 minutes )

Not subscribed yet? Stay in the loop with AI weekly by reading the Status Code for five minutes.

Subscribe To The Status Code

Two Headlines

Two main stories of last week. If you have only ~2 minutes 45 seconds to spare

1/ 🛋The Metaverse in your living room

Since the Metaverse incubation, Zuckerberg claimed that it would be the future of the internet.

In 2016, he wanted a billion people on Facebook in virtual reality asap.

Here we are, 7 years later where Gen-AI has massively taken over.

This week, Big M introduced open sources multisensory AI model that combines six types of data called

ImageBind

Another exciting open-source project? Absolutely.

Meta claims that ImageBind is the first model to combine six data types into a single embedding space.

It’s similar to AI image generators like DALL-E, Stable Diffusion, and Midjourney.

They pair words with images and generate visual scenes on the basis of a text description.

But here’s what’s different with IB.

It links images/video text, 3D measurements (depth), audio, temperature data (thermal), and motion data (from inertial measuring units).

On Midjourney, you can use prompts like,

“a pirate wearing a dress voyaging around the sea in a beach ball” and get a rather realistic photo of this absurd scene.

With multimodal AI tools like ImageBind, there’s more potential.

You can create a video of the pirate with ‘Aye Aye’ sounds and bring out the splashing noise of the water.

It’s a futuristic reality that fuels movement on a physical stage and helps us better perceive the environment.

However, it’s in the research phase.

But it promises a future that can create multisensory, sensory experiences with generative AI systems.

People feel that combining data from primary senses like visions comes naturally to us.

The data-hungry system of Multimodal learning is on its way to creating the ‘ultimate algorithm.’ With 6 modalities, this is a true multimodal AI with more to add on in the future.

P.S. Meta also dropped their ad tool this Thursday. It’s called AI Sandbox.

And it lets you edit, test, and design ads for Facebook and Instagram.

2/ 👑Google is reclaiming its throne

Google is back for ..everything!

Their I/O event is the talk of the town this week. And their stocks jumped 5% on Wednesday.

Last year's event focused on devices and features, but this year it's all about AI.

So, what's new?

1/ Help Me Write

No more Chrome extensions are needed. Google's bringing them a "Help Me Write" feature to draft emails.

And it's coming to Docs, Slides, and Sheets too!

2/ Magic Editor

Think of it as an upgraded version of Apple's cutout feature. You can now change lighting and remove objects from photos.

3/ BARD

Rumors were flying about Big BARD, but guess what? We got an even better BARD!

What's new?

Share to Gmail & Docs from results
Waitlist open to everyone
Multimodality
Coding upgrades and citation features
Dark mode

Big BARD is still in progress, but they announced something cooler - Gemini.

Gemini is a GPT competitor and a Google Deepmind project. They said BARD will slowly transition into Gemini.

This means they had planned out BARD to be experimental after all.

4/ PaLM 2

PaLM 2 is the successor to Google's PaLM model. It's trained on 5.4 trillion words (10x more than PaLM).

And it comes in four sizes:

Gecko(small)
Otter
Bison
Unicorn(large)

They only shared info about Gecko, the smallest of the bunch. And it'll be available offline and work on phones.

This is a sign that the next wave of AI will be native and offline.

However, there's a catch. With recent privacy concerns, Google hasn't mentioned which data they used in PaLM 2. They also haven't shared the hardware setup for training PaLM.

5/ Search

They are adding multimodality to the search. As it’s their top revenue model, they will also feature an ad after a search summary roundup for a topic.

Question→ search summary→Ad →more results and traditional links.

They are also bringing up conversational mode. It looks similar to that of Bing but powerful.

SEO will take a big hit from this. And, if you have a blog or a website, waiting for some time would be a good decision.

And we also got the workshop feature. The email assistant “help me write” is pretty solid.

Did you join the waitlist? If not, click here.

One Trend

1 trend you can pounce on. Reading time: ~1 minute 20 seconds

🧊 A peek into 3D-AI space

Japanese creativity is famous for unique designs in both 3D and 2D, like Origami and Anime.

This began in the 80s when Hideo Kodama invented the 3D printer using a process called stereolithography (SLA).

Today, 3D printers start at $200, but the actual cost is in their design.

Planning the framework, filling in the colors, and predicting their movement.

But we’re here to tell you that this is changing.

This week, Google partnered with Adobe Firefly to create 3D canvases using their geospatial API. They’ll bring something unique by combining the power of mapping and photography.

OpenAI released a new version of its PLAY-E model, called SHAP-E, which makes fine textures and complex shapes from text inputs.

This simplifies rendering and processing. The technology uses neural radiance fields (NeRFs), a VR/AR tech that transforms 3D visuals into photorealistic environments.

But how can 3D AI development help humans?

1/ Medical diagnosis

AI models can analyze and create human body models.

Imagine tumor shapes created before surgery for a test operation. A team of researchers did that, and the operation process was precise.

2/ Manufacturing

Design flaws are the enemy of car manufacturing. AI can help create these models and test them in various scenarios. If we feed them enough data from crash test footage, we can get a new era of vehicle safety.

3/ VR/AR

The development of 3D AI models can help to introduce AR models better. Google, Apple, and Microsoft have three projects for just that.

Google: “Project Tango” - uses 3D sensors to create maps of real-world environments
Apple: “ARKit” - uses motion sensors of the iPhone to create AR experiences
Microsoft: “Hololens” - uses a holographic headset for AR experiences

Blockade Labs is also working on its project Skybox. It generates virtual environments from text prompts. It's not perfect, but it works.