OpenAI’s latest text-to-video tool raises troubling questions

Peter Griffin

Peter Griffin: "Although Sora is set to revolutionise DIY film-making, its use as a tool for spreading misinformation is also very real." Photo / Getty Images

The most charming video that’s come across my newsfeed in recent days is of a fluffy cat wearing a pirate hat and riding around someone’s lounge on a robotic vacuum cleaner. The video has the familiar, slightly shaky look of a candid clip recorded on someone’s smartphone.

Only, the cat and the vacuum cleaner aren’t real, and neither is the house. The 12-second clip was created by Sora, the new text-to-video generator driven by artificial intelligence from OpenAI, the company behind the ChatGPT AI chatbot.

Sora uses a so-called diffusion AI model which works by examining a vast number of videos and learning to identify the objects and actions in them. It can then assemble completely new videos by responding to text prompts. Sora understands what the user has asked for in the prompt, as well as how those things exist in the physical world.

For instance, the prompt for that cat video was: “An adorable kitten pirate riding a robot vacuum around the house.” That’s exactly what Sora delivered.

Text-to-video AI tools are not new, but Sora is winning rave reviews for the incredibly realistic and smooth short video clips it can create. It isn’t limited to Finding Nemo-type animated videos but real-life characters, settings and objects.

Vague on the details: OpenAI’s Chief Technology Officer Mira Murati.

It really dawned on me as I scrolled through these early Sora clips that our perception of what is real and what is manufactured will be upended in the next couple of years as AI video is fine-tuned.

If, like me, you’ve been shooting your own movie script in your head for years, Sora could eventually let you bring it to life with the finesse of Sir Peter Jackson, minus the massive visual-effects budget. Its video clips are currently limited to 60 seconds, given the intensive computing power required to generate them. But video is just a collection of still images and Sora will be able to piece feature films together given access to enough processing power.

Sora is still a bit glitchy. In one video of a New York street, a yellow taxi disappears behind a pedestrian and re-emerges painted grey. But soon you’ll simply be able to prompt Sora to “fix the taxi cabs in the background”.

It is only in limited release while OpenAI “red team” the application, trying to figure out its bugs and how it could be misused.

Although Sora is set to revolutionise DIY film-making, its use as a tool for spreading misinformation is also very real. “This is the reason we are not deploying the system yet,” Mira Murati, OpenAI’s chief technology officer, told the Wall Street Journal.