Sora (text-to-video model) Facts for Kids

Quick facts for kids
Sora
Developer(s)	OpenAI
Initial release	15 February 2024; 13 months ago (2024-02-15)
Platform	OpenAI
Type	Text-to-video model

Sora is a text-to-video model by the U.S. based artificial intelligence (AI) research organization OpenAI. It can generate videos based on descriptive prompts as well as extend existing videos forwards or backwards in time.

History

Before Sora's release, several other, less realistic text-to-video generating models had been created, including Meta's Make-A-Video, Runway's Gen-2, and Google's Lumiere, the last of which, as of February 2024^[update], is in its research phase. OpenAI, the company behind Sora, had released DALL-E 3, the third of its DALL-E text-to-image models, in September 2023.

The team that developed Sora named it after the Japanese word for sky to signify its "limitless creative potential". On February 15, 2024, OpenAI first previewed Sora by releasing multiple clips of high-definition videos that it created, including an SUV driving down a mountain road, an animation of a "short fluffy monster" next to a candle, two people walking through Tokyo in the snow, and fake historical footage of the California gold rush, and stated that it was able to generate videos up to one minute long. The company then shared a technical report, which highlighted the methods used to train the model. OpenAI CEO Sam Altman also posted a series of tweets, responding to Twitter users' prompts with Sora-generated videos of the prompts.

OpenAI has stated that it plans to make Sora available to the public but that it would not be soon; it has not specified when. The company provided limited access to a small "red team", including experts in misinformation and bias, to perform adversarial testing on the model. The company also shared Sora with a small group of creative professionals, including video makers and artists, to seek feedback on its usefulness in creative fields.

Capabilities and limitations

The technology behind Sora is an adaptation of the technology behind DALL-E 3. According to OpenAI, Sora is a denoising diffusion in latent space with one Transformer as denoiser. A video is generated in latent space by denoising 3D "patches", then transformed to standard space by a video decompressor. Re-captioning is used during training to create good captions on videos that do not have good captions.

OpenAI trained the model using publicly-available videos as well as copyrighted videos licensed for the purpose, but did not reveal the number or the exact source of the videos. Upon its release, OpenAI acknowledged some of Sora's shortcomings, including its struggling to simulate complex physics, to understand causality, and to differentiate left from right. OpenAI also stated that, in adherence to the company's existing safety practices, Sora will restrict text prompts for violent, hateful, or celebrity imagery, as well as content featuring pre-existing intellectual property. Tim Brooks, a researcher on Sora, stated that the model figured out how to create 3D graphics from its dataset alone, while Bill Peebles, also a Sora researcher, said that the model automatically created different video angles without being prompted. According to OpenAI, Sora-generated videos are tagged with C2PA metadata to indicate that they were AI-generated.

Sora (text-to-video model) facts for kids

History

Capabilities and limitations

See also