A deep dive into deep fakes
Can you believe your own eyes and ears? AI-generated synthetic media is going to make it much harder.
A deep fake created by Eliot Higgins using Midjourney. No, I don’t know why one sleeve is missing.
Thanks to the (alleged) impending indictment of a certain (alleged) white collar criminal, deep fakes are all over the Interwebs today.
Eliot Higgins, the brains behind Bellingcat, a very accomplished group of investigative cyber journalists, decided to have a little fun with news of the hotly anticipated indictment.
He asked an AI tool called Midjourney to gin up a series of photo-realistic images showing the former Leader of the Free World running from the police, being apprehended, cuffed, wearing an orange jumpsuit [1], cleaning prison toilets, and sitting in a jail cell reading a book. [2]
These images went viral. Nearly 5 million people saw the pix Higgins posted on Twitter; lord only knows how and where else they were shared. Shortly thereafter, Higgins then apparently found himself banned from using Midjourney.
This news demonstrates two things: How good generative AI tools have gotten at creating deep fakes, and the potential for such fakes to add to our current tsunami of disinformation.
But before we get into that, please allow me to nerd out on you with this FAQ about fakes.
What is a deep fake?
Exactly what it sounds like. It's media that purports to be of someone real, doing something real, that isn't real. FYI, they're called deep fakes because they're generated by "deep learning" AI models -- multilayered neural networks -- not because they're part of the "deep state." [3]
Multilayered neural what?
Neural networks are the secret sauce of AI models. They're a series of sequential processes (aka algorithms) that handle data in ways designed to mimic the human brain.
A neural nets is kind of like a sandwich. [4] You've got two slices of bread on the outside, then all the stuff in the middle. The top slice is where you input data, and the bottom slice is where the results come out.
Some neural networks are like peanut butter and jelly sandwiches. They're thin, easier to design, and more transparent in how they operate. (But still delicious.) These are known in the biz as "shallow-sparse neural networks."
Some are like quadruple-decker club sandwiches. Between those slices of bread you've got lettuce, tomato, cheese, meat, maybe another slice of bread, more cheese, more meat, etc. You almost have to unhinge your jaw just to take a bite out of it.
A data sandwich, as drawn by Midjourney.
Each time the data passes through one of these layers, it gets a little more refined. If your sandwich is designed for image recognition, for example, the lettuce might determine contrast, identifying which parts of the image are lighter or darker. The cheese might start to identify basic geometric shapes, like triangles or squares. The meat might refine those shapes into more complex objects. The next slice of bread might start to identify colors, and so on.
OK, now I'm hungry. What are these sandwiches good for?
A peanut-butter-and-jelly neural network is good at quickly doing simple inferences. Are you a good candidate for a loan, based on your prior credit history, age, and zip code?
A massive, I-can't-believe-I-ate-the-whole-thing club sandwich neural network is good for doing really difficult things like creating fake audio, video, and images that would fool a human being.
Deep neural networks have gotten really good at this sort of mimicry, really fast. And that has all kinds of implications, most of them bad.
Like...?
The most obvious bad faith application of deep fakes is political propaganda. Last March, a fake video of Ukraine President Volodymyr Zelensky asking his soldiers to lay down their arms circulated on Facebook and YouTube. It got taken down pretty quickly, but it also wasn't that convincing.
Even the most infamous resident of Mar-a-Lago-By-The-Sea has gotten into the deep fake game, releasing an AI-generated photo of himself in the act of praying.
Pump-and-dump traders could use fake videos of CEOs making false statements to drive the price of their companies' stock up or down. You could defame people you don't much care for by making them seem foolish, or show them doing or saying something illegal, immoral, or misleading.
This is more of a 'shallow fake' I generated using a ‘digital people’ app called D-ID. Took me maybe five minutes.
Deep fakes are also yet another way for scammers to separate organizations from their money by impersonating people in positions of authority. We've seen a couple of deep audio fakes that fooled people into wiring cash to a scammer's account by sounding like their boss. They won't be the last.
The more dangerous implication, which I've not heard anyone talking about yet, is that it provides yet another excuse for people to disbelieve their own eyes. Just as anything negative about a certain politician becomes "fake news," any actual video or audio evidence of bad acts is automatically a "deep fake." [5]
Even people who wouldn't ordinarily crawl under the warm comforting blanket of denial will start to question everything they see and hear.
The score so far: Alternate Facts 1, Reality 0.
Isn't there a way to tell it's fake?
Yes and no. Wired has a guide on how you can tell a deep fake from the real McCoy. It involves things like analyzing how the hands are drawn and how background text is rendered. So far, images created by publicly available Gen-AI engines like Midjourney and DALL-E deliberately obfuscate text, as in this image of a Florida Man outside his favorite restaurant.
Researchers are using AI tools to detect deepfakes. Intel's FakeCatcher works by analyzing video to look for "subtle signs of blood flow." It claims to be accurate 96 percent of the time. Facebook is also working on AI that can identify synthetic media.
There are a couple of problems with these schemes. One: How many people are willing to go to the effort of closely analyzing photos online when they can barely be moved to read more than the headline of most stories? Two: Trusting social networks to auto detect and remove fake images and video is a bit like trusting them to automatically delete fake news. How has that worked out so far?
We're in a new arms race between the fakes and the fake detectors. So far, the detectors appear to be winning. But those tools have yet to be tested against this new crop of supersmart Gen AI engines. And, as with cybersecurity, defenses have to be perfect; fakers only need to succeed once.
What’s in that sandwich?
Perhaps the biggest problem with deep neural networks is that nobody -- not even the data scientists who built them -- understand how they arrive at their predictions. Neural networks are designed to teach themselves how to solve problems, and they aren’t sharing any of their recipes.
You could think you're creating a really nice Monterey Club and the algorithm slips fish paste or something equally disgusting in there. You won't know until you take a bite.
Now. Who’s ready for lunch?
Have you been fooled by a fake, even for a few seconds? Share your tale of shame and embarrassment below in the comments.
[1] It is his color, after all.
[2] A telltale sign these are fakes.
[3] Pro tip: If you want to make something sound truly sinister, preface it with 'deep.'
[4] Apologies to any actual data scientists in the audience.
[5] To quote the dystopia OG, George Orwell: "The party told you to reject the evidence of your eyes and ears. It was their final, most essential command."
Best short (meaning I read it all the way to the end) explanation I;ve seen. BUT, the sandwich analogy does not work - do you really want the results of your sandwich making to come out of the bottom slice of bread? Kind of a mess, isn't it.