I think this is very interesting because you seem to have reinvented NeRF, if I’m understanding it correctly. I only did one pass through but it looks at first glance like a different approach entirely.
More interesting is that you made an easy to use environment authoring tool that (I haven’t tried it yet) seems really slick.
Both of those are impressive alone but together that’s very exciting.
tehsauce 2 hours ago [-]
I love this! Your results seem comparable to the counter strike or minecraft models from a bit ago with massively less compute and data. It's particularly cool that it uses real world data. I've been wanting to do something like this for a while, like capturing a large dataset while backpacking in the cascades :)
I didn't see it in an obvious place on your github, do you have any plans to open source the training code?
alain94040 4 hours ago [-]
Appreciate this article that shows some failures on the way to a great result. Too many times, people only show how the polished end-result: look, I trained this AI and it produces these great results. The world dissolving was very interesting to see, even if I'm not sure I understand how it got fixed.
ollin 4 hours ago [-]
Thanks! My favorite failure mode (not mentioned in the post - I think it was during the first round of upgrades?) was a "dry" form of soupification where the texture detail didn't fully disappear https://imgur.com/c7gVRG0
udia 3 hours ago [-]
Very nice work. Seems very similar to the Oasis Minecraft simulator.
Yup, definitely similar! There are a lot of video-game-emulation World Models floating around now, https://worldarcade.gg had a list. In the self-driving & robotics literature there have also been many WMs created for policy training and evaluation. I don't remember a prior WM built on first-person cell-phone video, but it's a simple enough concept that someone has probably done it for a student project or something :)
puchatek 4 hours ago [-]
This is great but I think I'll stick to mushrooms.
LoganDark 6 minutes ago [-]
For some reason, psilocybin causes me to randomly just lose consciousness, and LSD doesn't. Weird stuff.
bongodongobob 3 hours ago [-]
Yeah, the similarities to psychedelics with some of this stuff is remarkable.
ilaksh 13 minutes ago [-]
It makes me think that maybe our visual perception is similar to what this program is doing in some ways.
I wonder if there are any computer vision projects that take a similar world emulation approach?
Imagine you collected the depth data also.
ilaksh 2 hours ago [-]
This seems incredibly powerful.
Imagine a similar technique but with productivity software.
And a pre-trained network that adapts quickly.
quantumHazer 5 hours ago [-]
Is this a solo/personal project? If it is is indeed very cool.
Is OP the blog’s author? Because in the post the author said that the purpose of the project is to show why NN are truly special and I wanted a more articulate view of why he/she thinks that?
Good work anyway!
ollin 4 hours ago [-]
Yes! This was a solo project done in my free time :) to learn about WMs and get more practice training GANs.
The special aspect of NNs (in the context of simulating worlds) is that NNs can mimic entire worlds from videos alone, without access to the source code (in the case of pokemon) or even without the source code having existed (as is the case for the real-world forest trail mimicked in this post). They mimic the entire interactive behavior of the world, not just the geometry (note e.g. the not-programmed-in autoexposure that appears when you look at the sky).
Although the neural world in the post is a toy project, and quite far from generating photorealistic frames with "trees that bend in the wind, lilypads that bob in the rain, birds that sing to each other", I think getting better results is mostly a matter of scale. See e.g. the GAIA-2 results (https://wayve.ai/wp-content/uploads/2025/03/generalisation_0..., https://wayve.ai/wp-content/uploads/2025/03/unsafe_ego_01_le...) for an example of what WMs can do without the realtime-rendering-in-a-browser constraints :)
janalsncm 3 hours ago [-]
You mentioned it took 100 gpu hours, what gpu did you train on?
ollin 2 hours ago [-]
Mostly 1xA10 (though I switched to 1xGH200 briefly at the end, lambda has a sale going). The network used in the post is very tiny, but I had to train a really long time w/ large batch to get somewhat-stable results.
Really cool. How much compute did you require to successfully train these models? Is it in the ballpark of something you could do with a single gaming GPU? Or did you spin up something fancier?
edit: I see now that you mention a pricepoint of 100 GPU-hours/roughly 100$. My mistake.
Rendered at 03:54:22 GMT+0000 (Coordinated Universal Time) with Vercel.
More interesting is that you made an easy to use environment authoring tool that (I haven’t tried it yet) seems really slick.
Both of those are impressive alone but together that’s very exciting.
I didn't see it in an obvious place on your github, do you have any plans to open source the training code?
https://oasis.decart.ai/
I wonder if there are any computer vision projects that take a similar world emulation approach?
Imagine you collected the depth data also.
Imagine a similar technique but with productivity software.
And a pre-trained network that adapts quickly.
Is OP the blog’s author? Because in the post the author said that the purpose of the project is to show why NN are truly special and I wanted a more articulate view of why he/she thinks that? Good work anyway!
The special aspect of NNs (in the context of simulating worlds) is that NNs can mimic entire worlds from videos alone, without access to the source code (in the case of pokemon) or even without the source code having existed (as is the case for the real-world forest trail mimicked in this post). They mimic the entire interactive behavior of the world, not just the geometry (note e.g. the not-programmed-in autoexposure that appears when you look at the sky).
Although the neural world in the post is a toy project, and quite far from generating photorealistic frames with "trees that bend in the wind, lilypads that bob in the rain, birds that sing to each other", I think getting better results is mostly a matter of scale. See e.g. the GAIA-2 results (https://wayve.ai/wp-content/uploads/2025/03/generalisation_0..., https://wayve.ai/wp-content/uploads/2025/03/unsafe_ego_01_le...) for an example of what WMs can do without the realtime-rendering-in-a-browser constraints :)
https://en.m.wikipedia.org/wiki/LSD:_Dream_Emulator
edit: I see now that you mention a pricepoint of 100 GPU-hours/roughly 100$. My mistake.