Back in the day on the Mac, the order of source files in your project would determine locality in the binary.
If memory serves, this was with MPW C or maybe CodeWarrior.
You could see the jump (jmp) instructions use short jumps rather than long ones.
fsflyer 2 hours ago [-]
The Metrowerks profiler and linker worked together to optimize locality in the binary, the focus was on PowerPC code. The linker could generate the static call tree, but the profiler could generate a dynamic call tree of what was actually called. Separating out the cold portions of the call tree into portions of the executable that didn't get paged in was the goal.
I worked on the Profiler and I seem to remember that Microsoft was one of the developers that put a bunch of effort into using this to optimize the Office suite on Mac. I remember the release of Word that used it was snappier.
rurban 3 hours ago [-]
This is still relevant. I had big success in writing an order optimizer for perl5
kardos 6 hours ago [-]
Does it work with Intel fortran-compiled code?
kijiki 2 hours ago [-]
As long as you relink with relocations preserved in the final ELF binary, it should.
https://vondra.me/posts/playing-with-bolt-and-postgres/
"results are unexpectedly good, in some cases up to 40%"
https://cachyos.org/blog/2411-kernel-autofdo/
If memory serves, this was with MPW C or maybe CodeWarrior.
You could see the jump (jmp) instructions use short jumps rather than long ones.
I worked on the Profiler and I seem to remember that Microsoft was one of the developers that put a bunch of effort into using this to optimize the Office suite on Mac. I remember the release of Word that used it was snappier.