So essentially the code got faster by switching to smarter regexp engine that can transform (hopefully) the regexp into a FSM.
Cool, but what about replacing the regexp with straightforward parsing code written manually? That way it is statically known that this function is fast all of the time.
At least that’s what I would do in a natively compiled language with access to simd, not sure if ruby has access to it and can detect the best variant at runtime.
And I’d argue plain code is simpler to understand, debug, troubleshoot than a complex regexp.
The regexp engine part reminds me of writing scalar code, relying on the compiler doing autovectorization to make it fast. And what very often happens is that the new compiler version changed their heuristics and now there is no autovectorization that kicks in and the performance falls off a cliff . It’s a risky bet.
eregon 14 hours ago [-]
Thanks for the comment.
> Cool, but what about replacing the regexp with straightforward parsing code written manually?
If you take a look at the linked snippets of C code, I think it's clear it's all but straightforward. The regexps OTOH are really short and expressive.
Ruby has no access to SIMD. And writing SIMD is basically writing inline assembly, so it's really tedious and messy.
> relying on the compiler doing autovectorization to make it fast.
I can relate to that, but Regexps are a much smaller domain, and there it's clear SIMD is always a win, so if the regexp engine uses SIMD it's very unlikely it would ever stop using it (unless something faster comes up).
Rendered at 07:57:10 GMT+0000 (Coordinated Universal Time) with Vercel.
So essentially the code got faster by switching to smarter regexp engine that can transform (hopefully) the regexp into a FSM. Cool, but what about replacing the regexp with straightforward parsing code written manually? That way it is statically known that this function is fast all of the time.
At least that’s what I would do in a natively compiled language with access to simd, not sure if ruby has access to it and can detect the best variant at runtime.
And I’d argue plain code is simpler to understand, debug, troubleshoot than a complex regexp.
The regexp engine part reminds me of writing scalar code, relying on the compiler doing autovectorization to make it fast. And what very often happens is that the new compiler version changed their heuristics and now there is no autovectorization that kicks in and the performance falls off a cliff . It’s a risky bet.
> Cool, but what about replacing the regexp with straightforward parsing code written manually?
If you take a look at the linked snippets of C code, I think it's clear it's all but straightforward. The regexps OTOH are really short and expressive.
Ruby has no access to SIMD. And writing SIMD is basically writing inline assembly, so it's really tedious and messy.
> relying on the compiler doing autovectorization to make it fast.
I can relate to that, but Regexps are a much smaller domain, and there it's clear SIMD is always a win, so if the regexp engine uses SIMD it's very unlikely it would ever stop using it (unless something faster comes up).