NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
GPU-Driven Clustered Forward Renderer (logdahl.net)
unclad5968 8 hours ago [-]
This is awesome! At the end you mention the 27k dragons and 10k lights just barely fits in 16ms. Do you see any paths to improve performance? I've seen some demos on with tens/hundreds of thousands of moving lights, but hard to tell if they're legit or highly constrained. I'm not a graphics programmer by trade.

I need a renderer for a personal project and after some research decided I'll implement a forward clustered renderer as well.

logdahl 8 hours ago [-]
Well, the core issue is still drawing. I took another look at some profiles again and seems like its not the renderer limiting this to 27k! I still had some stupid scene-graph traversal... But clustering and culling is 53us and 33us respectively, but the draw is 7ms. So a frame (on the GPU-side) is like 7ms, and some 100-200 us on the CPU side.

Should really dive deeper and update the measurements for final results...

godelski 1 hours ago [-]
I haven't look at the post in the detail it deserves, but given your graphs the workload looks pretty bursty. I'd suspect there are some good I/O optimizations or some predication. Definitely that last void main block looks ripe for that. But I'd listen to Knuth, premature optimization and all, so grab a profiler. I wouldn't be surprised if you're nearing peak performance. Also NVIDIA GPUs have a lot of special tricks that can be exploited but are buried in documentation... if you haven't already seen it (I suspect you have), you'd be interested in "GPU Gems". Gems 2 has some good stuff on predication.

But also, really good work! You should be proud of this! Squeezing that much out of that hardware is no easy feat.

gmueckl 8 hours ago [-]
This seems fairly well optimized. There's probably room to squeeze out some more perf, but not dramatic improvements. Maybe preventing overdraw of shaded pixels by doing a depth prepass would help.

Without digging into the detailed breakdown, I would assume that the sheer amount of teeny tiny triangles is the main bottleneck in this benchmark scene. When triangles become smaller than about 4x4 pixels, GPU utilization for raterization starts to diminish. And with the scaled down dragons, there's a lot of then in the frame.

spookie 5 hours ago [-]
This is by far the biggest culprit OP, look into this.

You can try to come up with imposters representing these far away dragons, or simple LoD levels. Some games do use particles to represent far away and repeated "meshes" (Ghost of Tsushima does these for soldiers far away).

Lot's of techniques in this area ranging from simple to bananas. LoD levels alone can get you pretty far! Of course, this is at the cost of having more different draw calls, so it is a balancing game.

Think about the topology too, hope these old gems helps getting a grasp on the cost of this:

https://www.humus.name/index.php?page=Comments&ID=228

https://www.g-truc.net/post-0662.html

logdahl 4 hours ago [-]
Yeah, I use LODs already but as you say, even my lowest lod far away is too many vertices. Imposter rendering seems very interesting but also completely bonkers (viewing angle, lighting)!
zokier 6 hours ago [-]
Worth noting that the GTX 1070 is nearly 10 year old "mainstream" GPU. I'd imagine a 5090 or something could push the numbers fair bit more higher.
rezmason 7 hours ago [-]
Ten thousand lights! Your utility bill must be enormous
Flex247A 7 hours ago [-]
Lights in games use real electricity :)
amelius 7 hours ago [-]
Even the stars use real electricity.
cluckindan 4 hours ago [-]
Not really, nuclear fusion doesn’t run on electrons.
DiabloD3 40 minutes ago [-]
So where does the magnetic field come from? ;) ;) ;)
fabiensanglard 8 hours ago [-]
This website has a beautiful layout ;) !
logdahl 8 hours ago [-]
Fun to see you ;) Love your site!
6 hours ago [-]
zeristor 9 hours ago [-]
Apostrophe as a number separator?

Where’s that from?

dahart 8 hours ago [-]
Switzerland and Italy for two. https://en.wikipedia.org/wiki/Decimal_separator#

Also note C++14 introduced the apostrophe in numeric literals! https://en.cppreference.com/w/cpp/language/integer_literal

qingcharles 7 hours ago [-]
I've started using the underscore in my code since that is becoming the (non-localized) standard and trendy:

https://en.wikipedia.org/wiki/Integer_literal#Digit_separato...

logdahl 8 hours ago [-]
Interesting that Sweden explicitly do NOT use it... Not sure where i picked it up! :-)
lacoolj 8 hours ago [-]
Learn somethin new every day.

And I would never have known this existed without hackernews

m-schuetz 2 hours ago [-]
Apostrophe are nice because they are not ambiguous. Started using them myself after getting used to them from C++ and learning that they are used in switzerland.
9 hours ago [-]
curtisszmania 8 hours ago [-]
[dead]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 00:40:33 GMT+0000 (Coordinated Universal Time) with Vercel.