Walking into World Labs—the San Francisco startup cofounded by Fei-Fei Li, known in tech circles as the “godmother of AI”—visitors are met with a wall tiled in rainbow hues. A grand piano beckons at the end of the deep, sunlit lobby. Chair-size iridescent spheres rest against the walls, as if a giant forgot to pick up his toys. But around the corner, the walls and floor look scrambled, like a cubist painting left out in the rain. Luckily, this “world” is just a simulation of the World Labs lobby I’ve built using Marble, the company’s generative AI app.
My real tour had taken place a week earlier at the company’s actual offices, where reality—including the rainbow wall, spheres, and piano—stayed mercifully intact. “Marble is the first part of [our] journey—it’s not a fully matured model,” CEO Li had told me, wearing a baggy maroon World Labs hoodie and settling her 5-foot-3 frame into a chair in a cozy wood-lined conference room.
A world-renowned AI scientist known for her self-effacing demeanor, Li is being characteristically modest about Marble’s capabilities. The app, launched last November, uses a new type of AI called a “world model” to generate interactive 3D replicas of any space—real or imaginary—from visual or written prompts. These “worlds” can be anything, from floating castles and sci-fi vistas to, well, a certain startup’s entryway.
My results were a little off, but Marble still conjured the whole thing in a matter of minutes from a few photos I haphazardly snapped with my iPhone. If you actually know what you’re doing, Marble’s world model can produce 3D environments rich and expansive enough to use in virtual film production, architectural design, and robotics training—all domains in which World Labs already has paying customers.
Li won’t comment on current revenue or engagement metrics; one of the company’s cofounders, Ben Mildenhall, says that Marble is still in a “postlaunch discovery phase” after coming out of beta last November. World Labs offers tiered subscriptions to individuals, but Li says she envisions its primary business model as “heavily B2B,” with companies paying to pipe Marble’s capabilities into their own products.
Yet it’s the world model underneath that truly excites Li—and World Labs’ investors, which include Andreessen Horowitz, AMD, Nvidia, and Autodesk.
The market for world models
World Labs emerged from stealth in September 2024 with no product, $230 million in funding, and a billion-dollar valuation. Last February, four months after announcing Marble, which Li still refers to as “a proto-product,” it raised a billion dollars more—$200 million from Autodesk alone.
Why? Because AI, just a few years after LLMs such as ChatGPT and Claude began remaking the global economy, is already seeking its next act. Li believes it will center on what she calls “spatial intelligence”: AI that can go beyond next-word prediction and learn how the real world works. Consider all the knowledge you leverage just to traverse a crowded sidewalk: distance and movement, time and space, cause and effect. It’s stuff a 3-year-old can master but still stumps your average trillion-parameter chatbot.
World models (so the thinking goes) will use “multimodal” training data drawn from videos, images, industrial sensors, and virtual simulations to unlock this more expansive intelligence, enabling everything from fully self-driving cars and autonomous robots to AI-powered factories and laboratories. Rev Lebaredian, Nvidia’s vice president of Omniverse and simulation technology, told the Financial Times last year that the potential market for world models could be $100 trillion, or roughly the size of the entire global economy.
If that sounds like the equivalent of saying “infinity dollars,” PwC issued a slightly less galaxy-brained estimate in March, pegging the total market for “physical AI” (its synonym for “spatial intelligence”) at $503 billion by 2030, a large portion of which could be addressed by world models.
“Spatial intelligence, to me, is the North Star,” Li says. “And a world model is a means [to get there].”
Competition in the spatial intelligence field
She’s hardly alone in her quest. The same week I visited World Labs, Nvidia held its annual GTC conference in San Jose, where CEO Jensen Huang proclaimed that eventually, via its own latest world models—Cosmos 3 and GR00T N2, announced in March—“every industrial company will become a robotics company.” Google DeepMind has developed a video-based world model, too, called Genie 3, which it uses as a training tool for Waymo’s self-driving cars and sees as “a stepping stone” toward AGI, or artificial general intelligence. Elon Musk claims that world models are “essential” to training Tesla’s Optimus humanoid robots; xAI poached world-model specialists from Nvidia last summer.
Meanwhile, smaller players from disparate industries are elbowing in. Runway, an AI video-generation startup that has a partnership with Hollywood studio Lionsgate, hit a $5 billion valuation in 2026 by pivoting to world models. General Intuition, which raised $134 million last year from investors such as Khosla Ventures (one of OpenAI’s early backers), is using billions of video game highlight reels to build a world model for AI agents.
And Yann LeCun—one of three other so-called AI “godparents” (in addition to Nobel Prize winner Geoffrey Hinton and Yoshua Bengio, a University of Montreal computer scientist)—quit his post as Meta’s chief AI scientist last year to launch his own world-model startup, AMI Labs. He, too, has raised a billion dollars in funding.
In a landscape this hot and crowded, what sets World Labs apart, besides having an actual product, as Li dryly points out, is the company’s commitment to what she calls “human-centered AI.” While other world-model companies like General Intuition and Tesla appear to chase automation über alles—“autonomous control of every spatial workload,” as described to me by Nicole Fraenkel, a partner at VC firm Khosla, which invests in General Intuition—World Labs wants its AI to be a copilot, not an autopilot.
“A couple of years ago, I kept hearing this phrase ‘infinite productivity,’” Li says, referring to the ever-inflating expectations around AI. “I don’t know how to react to [that]. We’re trying to build a rational business. We believe Marble can answer a pain point.”
That would be the persistent high cost—not to mention tediousness—of creating 3D simulations, particularly in entertainment, media, and robotics. But rather than having its world model swallow the task whole, World Labs has made a Photoshop-like tool aimed at creatives, builders, and researchers: professionals who want to do their jobs better with AI, but still, you know, do them.
Maybe that sounds quixotic as a strategy, but Li’s most prominent backers are buying in. “It’s our job to create AI that is a partner—this is an important part of our ethos. Fei-Fei understands this intimately,” says Autodesk CEO Andrew Anagnost. Even Martin Casado, a general partner at Andreessen Horowitz (the venture firm whose cofounder Marc Andreessen blithely claimed last year that one of the only jobs safe from AI automation was his own), agrees. “The vision is so clear. We understand how you build a business on top of this,” he says.
Li recognizes the moment that World Labs finds itself in. Three years ago, LLMs redefined what AI could do; now world models look poised to do it again. But whose version—and vision—of them?
“I’m keenly aware of multiple clocks,” Li says, in her mild but steely voice. “They’re all ticking.”
World Labs’ consistency theory
LLMs, as we all know by now, are “book smart” at best—they ingest and manipulate words, hallucinating as they go. World models, as explained to me by Bengio (yes, he is developing a world model of his own), aim to capture something both larger and more solid: “coherence with the real world.”
For a self-driving car, this might mean understanding that a stop sign still means “stop” even when it’s attached to the side of a school bus. (Waymo has trouble with this.) For robots, it might mean reasoning about cause and effect before acting. Stepping over a ditch, for example, usually makes more sense than stepping into it. A world model means not having to find out the hard way.
At World Labs, a coherent world model starts with something simpler, but arguably more important for AI to get right: the 3D consistency of the world itself. Iridescent spheres that maintain their size and shape as you move around them. Pianos that stay put. This basic spatial consistency “is so fundamental—it’s the linchpin that connects perception with action,” Li says. It’s the perfect foundation, in other words, for World Labs to start building an AI that goes beyond book smarts.
Li had already been an AI-research veteran for nearly two decades—with stints at Princeton, Stanford, and Google—when she got the idea to found a world-model company in the summer of 2023.
ChatGPT was barely six months old, and Silicon Valley was swooning for LLMs. Marc Andreessen was fully convinced that language-based AI would “save the world” (as he put it in his 7,000-word manifesto on the subject). At a lunch he had convened for “AI luminaries” at his home—where, as Li recalls, “everyone was talking about LLMs”—she found herself in a corner whispering to Casado, a network-security expert who oversees Andreessen Horowitz’s infrastructure practice.
“At the time, it was a very, very contrarian position to believe that language wasn’t all you needed,” Casado recalls. “Fei-Fei leans over to me and says, ‘You know what somebody needs to work on? A world model.’ I had come to the same conclusion, but not nearly [with] the depth that she [did].”
Li already had in mind her former student Justin Johnson, who she says was “getting headwind” on spatial AI as a researcher at Meta, as a cofounder. Casado introduced her to Mildenhall and Christoph Lassner, both experts in using AI for 3D graphics. In November 2023, World Labs was born.
For Li, the company is the result of “a career-long conviction” that vision lies at the heart of artificial intelligence. Back in 2012, early in her tenure as a researcher at what’s now known as Stanford’s Vision and Learning Lab, her ImageNet dataset helped neural networks successfully learn to recognize millions of pictures of objects, kicking off the “deep learning revolution” (as AI was called before ChatGPT came along). But Li also believes that seeing is key to intelligence, period.
“Just close your eyes,” she says. “Think about the average day and how much you rely on your spatial intelligence”—the fact that your visual reality stays intact, instead of dissolving into hallucinations. We take it for granted, but this cohesion is exactly what Marble’s world model is trained to produce.
And it’s harder than it sounds. DeepMind’s Genie 3, for instance, can’t maintain spatial consistency for longer than a few minutes. But my humble replica of the World Labs lobby will hang together as long as the company’s servers do.
Going for “Gaussian splats”
Both Marble and Genie rely on the same underlying technology as LLMs—known as the “transformer architecture.” (Marble also uses other methods that Li declined to discuss in detail.) But they take vastly different approaches.
Genie and other video-based world models use AI to produce a stream of new frames based on the ones that came before, and they quickly run out of steam because of cost; creating those images is the computational equivalent of setting money on fire. Sora, Open AI’s brief foray into world modeling, was reportedly burning through $15 million a day before the company killed off the app in March.
Marble’s primary output, however, isn’t even pictures. It’s “Gaussian splats” (yes, you read that right): overlapping blobs of math that encode how a scene looks from any angle. The simulation I made in Marble contains about 300,000 splats. Once the world model squirts them all into place, no more have to be generated, regardless of how many times the view changes. They stay put—just like that piano.
“Everything Marble does is because we have [this] permanence,” says Li. For now, it’s an advantage that video-based world models—i.e., most of them—can’t match.
Video game engines and digital twins have offered high-fidelity, persistent 3D simulations for years. But they have to be painstakingly set up by hand, at significant expense—typically between $30,000 and $40,000 for a single scene, according to Casado. Marble can create a 3D scene for less than a dollar, in minutes rather than days. “It was easy to show that specific use case as having value,” he says.
World Labs’ customers agree. Marble “genuinely accelerates a part of our process that used to be much slower or purely abstract,” says Hugues Bruyère, cofounder and chief technologist of Dpt., a Montreal-based creative studio that designs interactive and immersive experiences for clients like Bentley and Radisson Hotels. “That’s a real shift.”
Bruyère currently subscribes to Marble’s highest pricing tier: $95 per month. World Labs won’t specify how much it charges for customized enterprise subscriptions, but given that high-end enterprise plans from Anthropic and Salesforce can cost upwards of $400 per month, World Labs may be able to charge business customers hundreds of dollars above its off-the-rack subscription prices. And Li’s vision for World Labs as an enterprise operation is already underway.
OpenArt, a generative-art platform launched by ex-Googlers in 2022, began offering 3D world-generation (via World Labs’ API) to its community of 8 million monthly active users earlier this year. Preview.io, a Sequoia Capital–backed startup that provides AI-powered film production tools to Fortune 100 companies, uses Marble’s spatial control to help producers meet demanding briefs from clients in the automotive industry. And Lightwheel, which designs virtual “rooms” to train AI-powered robots, credits World Labs for changing its workflow overnight.
“We were using the same kitchen, the same living room, the same warehouse over and over,” says John Stephens, Lightwheel’s chief evangelist. With World Labs’ API, “we can generate a thousand rooms a day.”
For a frontier research lab, World Labs seems well on its way to product-market fit. However, Li can’t help but mention another reason she chose “pixel lovers”—her name for the kind of talent she hires and sells to—as Marble’s primary clientele.
“The creative community,” she says, “is more tolerant [of] the model not being great yet.” That’s nothing against artists or designers, “and it’s not for lack of ambition” at World Labs, she adds. Robotics, healthcare, and even scientific discovery are all on her road map. But in the content-creation business, “nobody gets killed when your model is not good enough.”

The “messy middle”
Li describes herself as a “techno-optimist.” However, she avoids using the term “AGI” (Why? “Because I’m a scientist,” she says flatly), and she worries about people: Her school-age kids. Her staff. Her academic colleagues. It’s not a matter of what AI could “do” to them, but what they may or may not be able to do with AI—because of an increasingly polarized global conversation that relentlessly centers on technology instead of the humans it’s supposedly for.
She’s not even fond of the “godmother” moniker she earned from the pioneering computer-vision research she started at Princeton in 2006 and continued at Stanford in 2009.
Raised in Chengdu, the capital of China’s Sichuan province, Li was encouraged by her unconventional parents to chase butterflies and pore over English literature, as she wrote in her 2023 memoir, The Worlds I See. At 16 she immigrated to Parsippany, New Jersey, where she entered public high school knowing little English—and left with a scholarship to Princeton, which she attended while running her parents’ dry-cleaning business on weekends.
After ImageNet later cemented her reputation as a researcher, Li took a 21-month sabbatical to lead AI at Google Cloud before returning to Stanford in 2019 to cofound its Institute for Human-Centered AI (HAI), which she says she continues to be “strategically involved in.”
It was there that she and her collaborators invented the term “foundation model” to describe the already-looming social consequences of generative AI. Under her guidance, HAI also worked with Congress to pass the National AI Initiative Act in 2021, which led to a pilot program (supported by both the Biden and Trump administrations) that deployed $100 million in AI computing resources across 14 federal agencies.
Nowadays, Li may find herself closing billion-dollar investment deals, but her views on AI haven’t wavered. Last year, Anthropic CEO Dario Amodei made headlines by issuing doomerish warnings about AI wiping out knowledge work, while Elon Musk assured credulous investors that AI and robots would “eliminate poverty.” Li, in contrast, used her keynote address at the AI Action Summit in Paris to urge the global policymakers in attendance (including Emmanuel Macron, JD Vance, and India’s Narendra Modi) to steer their decisions using “science, not science fiction.” It made far fewer headlines.
“She says it’s about being in the messy middle” of the is-AI-good-or-bad debate, says HAI executive director Russell Wald. “The middle isn’t necessarily fun, because it’s easier to be on these polarizing sides.”
It appeals to investors, though. Li’s principles are a big part of what attracted Autodesk’s $200 million investment in World Labs. “She has this brand around caring deeply about the human-centricity of AI,” Autodesk CEO Anagnost says. “There’s lots of hyperscalers out there that are looking at world models. You want other forces in the ecosystem. You want someone passionate about the technology and its implications. Those things together led me to keep chasing, chasing, chasing her.”
(Li acknowledges that her own stance on AI doesn’t necessarily extend to everyone who may use World Labs’ technology, or world models in general. “Humanity is messy,” she says. “We have to live with hope as well as our dark underbelly, and try to do the best [we can].” When asked if World Labs would ever potentially reject a customer or partner on ethical grounds, she declined to answer directly, stating only that “human-centered AI is a framework, not a means to an end.”)
Anagnost looks forward to the day when spatially intelligent AI can solve the “blank slate problem” much like ChatGPT does, except for designing buildings instead of writing term papers. He can envision a “hyperautomated factory” where a next-gen world model could let “you understand full utilization of the space, reconfigure it rapidly, and monitor it over time.” In the meantime, Autodesk plans for “interoperability” between Marble and Flow Studio, its 3D modeling suite for entertainment and visual effects.
Fei-Fei Li’s advantage
This is exactly the kind of talk that other world-model investors have no patience for. Khosla Ventures’ Fraenkel is enthusiastic about the hyperautomated part, but skeptical bordering on dismissive about Gaussian splats being a way to get there. In her view, products like Marble are “just an expensive camera” and “ultimately economically useless”—a distraction from world models’ true endgame.
“Agents and robots, they don’t need a prettier picture of the world,” Fraenkel asserts. “They need to understand what happens in it when they act. This is the prize.” In the world-model race, companies that go “straight for the jugular, straight for autopilot”—like the Khosla-backed General Intuition, or Tesla—will have the advantage, she says.
Even those playing on World Labs’ team recognize that its approach has risks. The company’s world model needs more training to move beyond basic spatial coherence—compute and data are part of what the billion dollars is earmarked for—but “we don’t even know if the architecture is right,” says Casado, referring to the underlying technical design of the world model. “Literally, nobody’s built one before.”
Li won’t disclose exactly what mix of data World Labs intends to use, or where it’ll come from, but she acknowledges that world-model training data isn’t exactly easy to come by: “There’s no internet of text [for this]. There is a large amount of data we have to procure.” As a small startup, World Labs could also struggle to attract “good talent” to its idiosyncratic mission, she says. She even concedes that competitors might simply end up moving faster.
Still, Li has faced long odds before. Her ImageNet project is lauded as the cradle of modern AI now, but when she started, it was seen as so audacious it could have jeopardized her pathway to tenure, according to HAI’s Wald.
Li cites another source of what she considers her “entrepreneurial shamelessness”: “I’m an immigrant,” she says. “My parents and I built a life here. I’m used to being humbled. I’m very comfortable on ground zero.” If every tech startup needs an “unfair advantage”— Silicon Valley’s go-to phrase for a unique competitive asset—World Labs’ might just be its own diminutive, determined CEO.
Of course, Li would never say so herself. “Everybody tells me I’m too quiet,” she says. “My friends, my investors, my colleagues.” At first she didn’t even want to give an interview for this story, but her team twisted her arm. “I don’t walk around thinking, ‘I’m Fei-Fei Li,’” she says. “I always build for a mission.” Her vision for World Labs—and its human-centered future—is both consistent and persistent. It’s like a simulation in her world model. Once in place, it stays put.
