Meet Sora: The Future of Text-To-Video Technology by OpenAI

If we told you that the image on the right is a snapshot from an AI-generated video, would you believe it? No, your eyes aren’t fooling you. Yes, it does look very realistic for a clip created by a computer completely out of thin air! And that’s the kind of spectacle that Sora, OpenAI’s new AI video tool can whip up in seconds. 

Just about a year ago, AI models were generating monstrous hands and eerie human faces. But today we see photorealistic designs and near-accurate renditions even in designs with text. Video generation is the next biggest trend that has been expanding explosively in recent times. In fact, based on The Animation Guild’s estimation, about 21.4% of the U.S. Film, Television, and Animation jobs are likely to be partially or fully replaced by AI tools by the year 2026. 

But have we really come that far in AI video you ask? Well, take a look at the below video. If a random YouTube channel posts this one as an upcoming movie trailer wouldn’t you fall for it? However, this scary good video was generated by Sora too! So, yes, AI video has made big progress recently.  

If that video made your jaw drop, then we’re sure you are curious to find out more about Sora. this blog will give you a brief overview of what this AI video model is, where and how to access it, and where it stands today in the competitive generative AI realm. 

OpenAI’s Sora: A Quick Introduction 

Put in simple words, Sora is a generative AI video model that can create videos based on simple text prompts. Right now, the model can generate videos up to 60 seconds long. So, what are the current capabilities of Sora? 

  • Simple to complex scenes – both realistic and surreal concepts. 
  • Layered details with respect to the backgrounds.
  • Single and multiple subject videos. 
  • Particular types of motion mimicking real-world motion. 
  • Emotional details in the subject/subjects of the video. 
  • Create videos from static image input. 
  • Add details or extend existing videos, looping them or even adding extra frames. 
  • Connect videos.
  • Apply edits to specific elements like the visual styles. 
  • Generate images. 
  • Three-dimensional consistency to ensure that the subjects remain realistic even with dynamic camera positions. 
  • Long-range coherence and object permanence.
  • Simulation of virtual worlds. 
Source 

(Prompt: The camera directly faces colorful buildings in Burano Italy. An adorable Dalmatian looks through a window on a building on the ground floor. Many people are walking and cycling along the canal streets in front of the buildings.)

Other than the above overview from OpenAI, there is no news about the interface and particular features to explore. With DALL.E for example, OpenAI initially launched a web app version, and then with the DALL·E 3 update, the feature moved into the ChatGPT interface. As for ChatGPT, there is also a mobile version right now along with the web version. So, we do not know where Sora might be accessible. 

Of course, the API version is available for all of OpenAI’s models so far and will be available for Sora as well. 

How To Access Sora 

Right now, OpenAI has opened up Sora only to red-teaming, a select number of visual artists, filmmakers, and designers. This is to allow them some time to rigorously test the ins and outs of the model and perfect it before public launch. All the videos currently available are therefore the ones released by the OpenAI team themselves. And a few from the creators who have access to the trial run. 

Sam Altman, CEO of OpenAI has also been responding to requests from curious users and sharing videos based on prompts suggested by users on X. He shared the below video for example, based on the suggested prompt of “a monkey playing chess in a park.”. 

Currently, OpenAI has not disclosed any information or provided an official release date for the public availability of Sora. So, despite how intriguing Sora might appear, there’s no way to access this AI video model right now. 

An Overview of the Tech Behind Sora

Built on a transformer architecture like most other OpenAI models, Sora represents images as patches similar to the tokens used in GPT. 

One of the most prominent capabilities of Sora is its ability to stay relevant to the prompts as much as possible. This comes from OpenAI’s integration of the “recaptioning technique” that became popular with the DALL·E 3. This combines all the OpenAI models and generates descriptive prompts based on the text input from the user. 

Given that OpenAI now has a massive reserve of training data based on their research with other models like ChatGPT and DALL.E in particular, all of the existing training data will also be included in refining the performance of Sora. This will help avoid all the setbacks and backlash that the other OpenAI tools had to face at the time of their release. 

Unlike the situation with ChatGPT upon its release, where it stood as the sole major generative AI chatbot, setting its own standards, Sora enters a more established AI video landscape. This offers both challenges and opportunities: while competition calls for staying ahead of the curve, OpenAI also benefits from a wealth of existing solutions and learnings from previous AI video models’ successes and shortcomings. This allows them to refine their approach and avoid potential pitfalls identified by others.

So, let’s talk about the other players in the generative AI video space to understand Sora’s current standing among its peers.

The Current State of the AI Video Market: Where Does Sora Stand? 

How has the AI video market evolved? Well, let the below videos speak for themselves. The first one, believe it or not, was the kind of AI video that could be generated just about a year ago. The second one though, is one generated by Sora. 

Sora, from all the videos that OpenAI has shared so far, looks pretty good. However, it’s up against some solid competitors. Let’s quickly look at a few of them.  

Meta’s Emu Video 
Source 

Emu Video uses Meta’s Emu models that can generate videos based on text only, image only, and both text and image prompts. Emu Video works on a diffusion model that uses factorization approach for video generation. The results from Emu Video are pretty good. But a solid difference between Emu Video and Sora is that the former can currently generate 4-second videos while the latter can generate 60-second videos. 

Stable Video Diffusion 

Stability AI’s Stable Diffusion is an AI video tool that recently created a lot of buzz on social media for the realistic output it generates. One of the notable differences is that the currently available model only supports image input whereas Sora supports text, image, and video input. 

Stable Video Diffusion also lags behind Sora right now given that you can only generate videos of 14 and 25 frames with customizable frame rates. Moreover, Stable Diffusion is also not available for commercial applications as of now. 

Google Lumiere 

Google’s Lumiere also uses a diffusion model (STUNet). Reportedly, the model works by creating the video as it goes rather than creating frames and animating them. While the strategy used to create the animation is different, Lumiere is also pretty versatile in terms of usability since it can accept multimodal inputs like Sora. So yes, Google’s Lumiere is one of the closest contenders Sora will have to keep up with! 

Runway’s Motion Brush 

Most other AI video tools available right now do not support static image input nor do they support video editing. Runway’s Gen-2 Motion Brush Motion Brush is a tool that can do these. With this tool, you can simply add a static image and use the Motion Brush to select areas or elements that need to be animated. In various videos. The Stability AI team has also showcased how this feature can be used to add expressions and bring the subject to life. 

To better understand how these tools pitch against each other, take a look at the results shared in the below post. In this, the creator shares the results obtained for the same prompt on 4 popular AI video platforms – Sora, Runway ML, Stable Video Diffusion, and Pika. 

In addition to these standalone AI video tools, popular design tools like Adobe Express and Canva also have AI video generation capabilities. But yes, these are pretty basic and there’s still a lot of room for improvement. 

With the current stance of Sora discussed, let’s move on to the most crucial question – where can you use Sora and the videos it generates? Because no matter how realistic AI-generated videos can be, they cannot replace the real thing, at least for now! Some visual glitches, some inaccuracies in prompt perception, and the lack of a human touch might still be visible in several videos. We’ll talk about the most discussed weaknesses of Sora in a minute. But before that, let’s quickly look at some of the applications of the videos generated using Sora. 

Applications of Sora 

Cut back costs on video production 

Running on a tight budget but need videos that communicate your message in a unique and memorable way? Then AI video tools like Sora might be the answer. The below video Sam Altman posted in response to a user’s request is a good example. 

Production of actual footage of 2 dogs with all the gear transported to a scenic mountain location might not have been easy, in terms of the effort and the cost as well. 

Create 3D graphics without spending a big budget 

Let’s face it, hiring an animator to create immersive animated characters like this one is not within the budget for several marketing teams. So, Sora can help save the day! And the fact that you can also use the tool to add animations to existing static images means that you can animate designs featuring your mascot to add a creative twist. 

Imagine the kind of budget a brand would have to assign to create immersive graphics like the one below! 

Generate animal scenes 

Several Hollywood production companies use CGI in animal scenes to avoid harming the animals and the actors involved, among other reasons. In such applications, tools like Sora can come in handy especially when the situation only calls for a few seconds of animal scenes. The time otherwise spent on CGI production can be reduced by using advancements like Sora. 

The below post shared by an OpenAI research scientist Bill Peebles is a good example of how Sora can tackle animal scenes efficiently. 

Generate stock videos for cliched themes 

If all the stock videos you find for your design seem to be cliched and repetitive, then your ad might get lost in the digital clutter as well. You need unique visuals for your content to appear more authentic. 

Take the below video generated on Sora for example. It makes a great stock video for say a travel company. 

Source 

(Prompt: Aerial view of Santorini during the blue hour, showcasing the stunning architecture of white Cycladic buildings with blue domes. The caldera views are breathtaking, and the lighting creates a beautiful, serene atmosphere.)

You might find several stock videos of aerial shots of Santorini but again, many other brands in your niche might be using them as well. So, you can use AI video tools like Sora to generate something unique for your brand. 

Generate videos for ideas where stock videos are hard to come by 

Sometimes, you have too few options when browsing for stock videos. And the ones you find might not have everything you are looking for. In such cases too, Sora can help. You can get as imaginative as you like and AI video tools like Sora will be able to bring your ideas to life. 

The video here looks stunning, as you can see. However, to achieve results like this one, it is important to use detailed clearly worded prompts that talk about every little detail in the scene. 

While Sora can help with all these applications, the videos generated right now are not perfect. OpenAI has spoken about the flaws in the tool on their page to help users understand the current strengths and limitations of Sora. Let’s take a quick look at a few of these weaknesses discussed by OpenAI. 

Limitations of Sora 

Unrealistic motion 

Sora can sometimes create “physically implausible motion”. In other words, some of the videos created can have the subjects moving in an odd and unnatural manner. These might be movements that are not practical in the real world. 

They shared the below video of an athlete running backward on a treadmill. 

The backward-flying dragon in the video that Sam Altman shared on X is another example. 

Objects appearing/disappearing randomly 

OpenAI shared the below video to show how some videos generated on Sora might display moving objects randomly appearing or disappearing from the scene. Notice the wolf pups randomly appearing amidst the pack of three. 

Source 

If the video requires a specific number of objects or characters, this spontaneous generation can throw things off, making the output inaccurate or inconsistent with the prompt. If the unexpected objects contradict the intended content, it could misrepresent the message. 

Unnatural object morphing 
Source 

As can be seen in the above example, Sora reportedly ends up creating some unnatural object morphing sometimes. Such incorrect morphing can lead to reduced realism in videos. That could make the video appear fake which can be a problem when you are generating videos to use in your marketing campaigns. 

Poor understanding of some objects and concepts 

Despite the extensive training, there could be some inaccuracies in Sora’s understanding of some objects and concepts as you can see in the rendition of the chair in the below video. 

This issue can again result in videos that look uncanny and unusable. While subtle imperfections can often be refined with other design tools, issues like this one with a prominent object on multiple frames appearing unrealistic, the video might not be of any use. It therefore might take multiple iterations to get the intended results. 

Anomalies in terms of interactions between objects 

At first glance, the below video looks like a normal birthday scene but look closely at the people in the background. Oh yeah, the very familiar hand anomalies! And some unnatural movements and interactions between the subjects too. 

Source 

As can be seen in this example, this anomaly can be a problem when there are multiple subjects, especially multiple moving subjects. Or even when there is a complicated background or interaction between multiple textures! 

A Final Word 

So, that was a quick refresher on everything we know about Sora so far! So yes, as it appears right now, Sora looks very promising. It sure looks like it could be a disruptive force in the world of generative video and video design on the whole. But we’ll know more once the model is released to the public and a more diverse group of audience gets to experiment with it. Until then keep an eye out on Sam Altman’s X page for more exciting video samples to understand Sora’s capabilities.

Got some AI-generated videos that need to be tweaked to give them a professional makeover to use them in your marketing campaigns? The KIMP Video subscription also covers video editing! Register now for a free trial of KIMP Video!