Mark Zuckerberg wants you to know what the metaverse smells like
Meta is building generative AI that combines sight, sound, touch, and other senses.
AI-generated dancing robot, doing the funky chicken in the heart of Times Square. Source: Meta.
Remember Facebook?
You know, that online space where you were able to cyberstalk your old high school sweetheart while engaging in stupid political arguments with your racist relatives?
Despite still having nearly 3 billion monthly active users, the social network has ceased to have the same kind of impact on our daily lives that it had, say, six or seven years ago, when it was a major reason why a certain Florida Man became the most powerful biped on earth. [1]
At this point, Hollywood is unlikely to make any new movies about Facebook. It had its moment in the zeitgeist, and now it's in its post-menopausal/pre-senescent AOL phase.
But Meta, the parent company of the social network, is very much alive, and poised to be a huge player in the AI space, even if Mark Zuckerberg didn't get an invitation to the White House's recent AI Summit. [2]
Choose your own meta-adventure
Last fall, Meta introduced "Make-A-Video," a tool that allows you to generate short videos from a simple text prompt. Like this one, for example:
It turns out that you can lead a horse to water and make it drink, if you throw enough AI at it.
Make-A-Video can also take a still image and animate it, or blend two images together into a mini video.
I have no idea what this is, but I find it absolutely terrifying.
Earlier this year, the company introduced a new version of its open source Large Language Model Meta AI (LLaMa), that's supposed to be smarter-faster-and-better-looking than the ones powering ChatGPT, Google's Bard, and Microsoft Bing Chat.
Last week, Meta revealed that it's working on an AI model that combines text, audio, images, depth perception, temperature, and movement data to generate artificial media. In other words, if you told such a model to "draw an image of a pigeon," it could generate a video of a pigeon flying, strutting, cooing, and crapping all over everything.
Or you could generate the same video by providing audio of a pigeon cooing. Feed the model the sound of a train horn or a baby crying, and get a video of the Orient Express pulling into the station or a squalling infant. In other words, this "multimodal model" automatically binds together different sensory characteristics of physical objects, not unlike the way our brains do. When you hear a pigeon cooing, your brain automatically associates that with the image of the bird (and possibly also a shit-streaked statue). [3]
And Meta has only just begun. From the blogpost where they announced this stuff:
While we explored six modalities in our current research, we believe that introducing new modalities that link as many senses as possible — like touch, speech, smell, and brain fMRI signals — will enable richer human-centric AI models.
Never metaverse I didn't like
You are probably now wondering: What has this got to do with me, my former high school sweetheart, or my racist uncle? Why should I care how good Meta's AI is at blending all these things together?
The answer is that this level of AI is necessary to make the metaverse possible. Per The Verge's James Vincent:
Imagine, for example, a futuristic virtual reality device that not only generates audio and visual input but also your environment and movement on a physical stage. You might ask it to emulate a long sea voyage, and it would not only place you on a ship with the noise of the waves in the background but also the rocking of the deck under your feet and the cool breeze of the ocean air.
In theory, at least, teaching AI how to associate different sensory experiences could also allow machines to reproduce these experiences on demand.
Until now, attempts at creating "the metaverse" have been pretty lame. Creating an "immersive experience" typically involves putting on a headset and stumbling around a room grasping at cartoon-like objects floating in virtual space around you. [4] Even then you're typically limited to audio and video. That's one of the reasons why virtual reality has never gone mainstream, and why the metaverse in its current form is multiplayer games like Roblox, World of Warcraft, or Fortnite.
Professor Moriarty, Dixon Hill, and his android friend. Source: CBR.com
But instead of strapping on geeky electronics to enter a cartoon universe, imagine stepping into a holodeck where you can touch, taste, and smell things that look and feel real. Imagine being able to relive your favorite memories from childhood, as often as you like. Or to re-experience childhood traumas in order to overcome them. Or to recreate what it was like for your immigrant grandparents to come to the US for the first time. Or to visit an exotic locale, like the surface of Mars, without having to buy a ticket from Elon Musk. And so on.
AI-powered virtual reality is probably not going to be a thing for some time to come. But it will probably get here at some point. And you might have Mark Zuckerberg to thank and/or blame.
If you had a holodeck what would you use it for? Share your virtual visions in the comments below.
[1] You can debate exactly how much impact FB had on the 2016 election results (Cambridge Analytica, Russian micro-targeting, blah blah blah), but it was an absolute fire hose of mis- and disinformation, aka "Fake news."
[2] I'm sure it was just lost in the mail.
[3] I seem to have pigeons on the brain this morning, alas. Don't ask me why.
[4] If you've never watched someone else do this, it's really quite amusing. Like a drunk groping for a light switch in the dark.