With Sora, OpenAI highlights the mystery and clarity of its mission

Last Thursday, OpenAI released a demo of its new text-to-video model Sora, that “can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt.”

Perhaps you’ve seen one, two or 20 examples of the video clips OpenAI provided, from the litter of golden retriever puppies popping their heads out of the snow to the couple walking through the bustling Tokyo street. Maybe your reaction was wonder and awe, or anger and disgust, or worry and concern — depending on your view of generative AI overall.

Personally, my reaction was a mix of amazement, uncertainty and good old-fashioned curiosity. Ultimately I, and many others, want to know — what is the Sora release really about?

Here’s my take: With Sora, OpenAI offers what I think is a perfect example of the company’s pervasive air of mystery around its constant releases, particularly just three months after CEO Sam Altman’s firing and quick comeback. That enigmatic aura feeds the hype around each of its announcements.

VB Event

The AI Impact Tour – NYC

We’ll be in New York on February 29 in partnership with Microsoft to discuss how to balance risks and rewards of AI applications. Request an invite to the exclusive event below.

Request an invite

Of course, OpenAI is not “open.” It offers closed, proprietary models, which makes its offerings mysterious by design. But think about it — millions of us are now trying to parse every word around the Sora release, from Altman and many others. We wonder or opine on how the black-box model really works, what data it was trained on, why it was suddenly released now, what it will really be used for, and the consequences of its future development on the industry, the global workforce, society at large, and the environment. All for a demo that will not be released as a product anytime soon — it’s AI hype on steroids.

At the same time, Sora also exemplifies the very un-mysterious, transparent clarity OpenAI has around its mission to develop artificial general intelligence (AGI) and ensure that it “benefits all of humanity.”

After all, OpenAI said it is sharing Sora’s research progress early “to start working with and getting feedback from people outside of OpenAI and to give the public a sense of what AI capabilities are on the horizon.” The title of the Sora technical report, “Video generation models as world simulators,” shows that this is not a company looking to simply release a text-to-video model for creatives to work with. Instead, this is clearly AI researchers doing what AI researchers do — pushing against the edges of the frontier. In OpenAI’s case, that push is towards AGI, even if there is no agreed-upon definition of what that means.

The strange duality behind OpenAI’s Sora

That strange duality — the mysterious alchemy of OpenAI’s current efforts, and unwavering clarity of its long-term mission — often gets overlooked and under-analyzed, I believe, as more of the general public becomes aware of its technology and more businesses sign on to use its products.

The OpenAI researchers working on Sora are certainly concerned about the present impact and are being careful about deployment for creative use. For example, Aditya Ramesh, an OpenAI scientist who co-created DALL-E and is on the Sora team, told MIT Technology Review that OpenAI is worried about misuses of fake but photorealistic video. “We’re being careful about deployment here and making sure we have all our bases covered before we put this in the hands of the general public,” he said.

But Ramesh also considers Sora a stepping stone. “We’re excited about making this step toward AI that can reason about the world like we do,” he posted on X.

Ramesh spoke about video goals over a year ago

In January 2023, I spoke to Ramesh for a look back at the evolution DALL-E on the second anniversary of the original DALL-E paper.

I dug up my transcript of that conversation and it turns out that Ramesh was already talking about video. When I asked him what interested him most about working on DALL-E, he said that the aspects of intelligence that are “bespoke” to vision and what can be done in vision were what he found the most interesting.

“Especially with video,” he added. “You can imagine how a model that would be capable of generating a video could plan across long-time horizons, think about cause and effect, and then reason about things that have happened in the past.”

Ramesh also talked, I felt, from the heart about the OpenAI duality. On the one hand, he felt good about exposing more people to what DALL-E could do. “I hope that over time, more and more people get to learn about and explore what can be done with AI and that sort of open up this platform where people who want to do things with our technology can can easily access it through through our website and find ways to use it to build things that they’d like to see.”

On the other hand, he said that his main interest in DALL-E as a researcher was “to push this as far as possible.” That is, the team started the DALL-E research project because “we had success with GPT-2 and we knew that there was potential in applying the same technology to other modalities — and we felt like text-to-image generation was interesting because…we wanted to see if we trained a model to generate images from text well enough, whether it could do the same kinds of things that humans can in regard to extrapolation and so on.”

Ultimately, Sora it is not about video at all

In the short term, we can look at Sora as a potential creative tool with lots of problems to be solved. But don’t be fooled — to OpenAI, Sora is not really about video at all.

Whether you think Sora is a “data-driven physics” engine that is a “simulation of many worlds, real or fantastical,” like Nvidia’s Jim Fan, or you think “modeling the world for action by generating pixel is as wasteful and doomed to failure as the largely-abandoned idea of ‘analysis by synthesis,’ like Yann LeCun, I think it’s clear that looking at Sora simply as a jaw-dropping, powerful video application — that plays into all the anger and fear and excitement around today’s generative AI — misses the duality of OpenAI.

OpenAI is certainly running the current generative AI playbook, with its consumer products, enterprise sales, and developer community-building. But it’s also using all of that as stepping stone towards developing the power over whatever it believes AGI is, could be, or should be defined as.

So for everyone out there who wonders what Sora is good for, make sure you keep that duality in mind: OpenAI may currently be playing the video game, but it has its eye on a much bigger prize.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

Source link