Unsurprisingly, it turns out that other people had already thought of applying the multi-pass technique on GPU, but the idea is not very widely known.
The demoscene is particularly insular, but even within the field of computing in general it seems that there is not a lot of knowledge diffusion between all the different areas, leading to some reinventions (often with distinct terminology.)
Tomte 30 days ago [-]
For example, the requirements for a CPU instruction set, in order for it to be properly virtualizable, had been known in the mainframe computing world for many, many years, when Intel and AMD came up with their unvirtualizable (except for VMware‘s heroic tricks) 32 bit instruction sets.
Those requirements and their different jargon from the mainframe world were re-discovered from the literature when virtualization in the PC world became a selling point.
(Edouard Bugnion et. al. - Hardware and Software Support for Virtualization)
linolevan 31 days ago [-]
Played around with the code to implement a little bit of SIMD. Was able to squeeze out a decent improvement, ~250 fps avg, ~140 low, ~333 high (on an m4). Looks pretty straightforward to do threading with as well. Cool stuff! Could work to bring more gpu stuff back down to the cpu.
riggsdk 29 days ago [-]
I do this in a more crude fashion (like the article mentions) on the GPU in an old personal project.
I just run a low-res (256x256) pre-pass and store the distances to a floating point texture. I then use that pre-pass texture as the starting point (minus some delta) when drawing full-screen. This makes it really nice and performant, even for complex SDF shapes.
I think a common misconception in GPU programming is that branching is slow. It is only really slow when neighboring fragments diverges on those logic branches.
The quick pre-pass step gets close enough to the SDF surface that more fragments are in lockstep with each other and terminates at the same time, eliminating the expensive re-runs the GPU driver has to do. More experimentation is needed on my end. I do this in the browser with WebGL so accurate profiling is sometimes difficult.
I experimented with different resolutions and number of pre-pass steps but found it was sufficient on most GPU's with a single prepass run (subject to change the more I test).
refulgentis 31 days ago [-]
Tl;dr: SDFs are really slow but cool because they can compactly define complex stuff; demoscene uses it. Sort of the functional programming to trad renderings OOP. Would be cool if it was faster. Optimizing an algorithm for CPU rendering using recursive divide and conquer, 1 core with one object gets 50 fps. 100 fps if you lerp a 10x10 pixel patch instead of doing 1 pixel. Algorithm isn’t optimized, fully. Also, turns out the author’s idea is previously known but somewhat obscure, it is referred to as “cone marching”
kg 31 days ago [-]
SDFs can be pretty fast if you do the work to optimize around them. Unreal Engine has lots of features based on SDFs that are used to great effect in games that run on consumer hardware.
You don't need bleeding edge hardware or software either. The game I'm working on generates a new SDF every frame for the scene (using the GPU's fragment units to rasterize the distance data for the objects in the scene into a scratch buffer) and then does cone traces through the generated SDF per-pixel to do realtime soft shadow casting and lighting, and that performs just fine even on an old laptop from 2015.
"I'm making a game engine based on dynamic signed distance fields (SDFs)"
That project is for GPU and it works by caching SDFs as marching cubes, with high resolution near the camera and low resolution far away, to build huge worlds out of arbitrary numbers of SDF edits.
So it probably wouldn't stack with these CPU optimizations that directly render the SDF at all.
Rendered at 20:44:24 GMT+0000 (Coordinated Universal Time) with Vercel.
The demoscene is particularly insular, but even within the field of computing in general it seems that there is not a lot of knowledge diffusion between all the different areas, leading to some reinventions (often with distinct terminology.)
Those requirements and their different jargon from the mainframe world were re-discovered from the literature when virtualization in the PC world became a selling point.
(Edouard Bugnion et. al. - Hardware and Software Support for Virtualization)
I just run a low-res (256x256) pre-pass and store the distances to a floating point texture. I then use that pre-pass texture as the starting point (minus some delta) when drawing full-screen. This makes it really nice and performant, even for complex SDF shapes.
I think a common misconception in GPU programming is that branching is slow. It is only really slow when neighboring fragments diverges on those logic branches.
The quick pre-pass step gets close enough to the SDF surface that more fragments are in lockstep with each other and terminates at the same time, eliminating the expensive re-runs the GPU driver has to do. More experimentation is needed on my end. I do this in the browser with WebGL so accurate profiling is sometimes difficult.
I experimented with different resolutions and number of pre-pass steps but found it was sufficient on most GPU's with a single prepass run (subject to change the more I test).
You don't need bleeding edge hardware or software either. The game I'm working on generates a new SDF every frame for the scene (using the GPU's fragment units to rasterize the distance data for the objects in the scene into a scratch buffer) and then does cone traces through the generated SDF per-pixel to do realtime soft shadow casting and lighting, and that performs just fine even on an old laptop from 2015.
"I'm making a game engine based on dynamic signed distance fields (SDFs)"
That project is for GPU and it works by caching SDFs as marching cubes, with high resolution near the camera and low resolution far away, to build huge worlds out of arbitrary numbers of SDF edits.
So it probably wouldn't stack with these CPU optimizations that directly render the SDF at all.