NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Python: The Optimization Ladder (cemrehancavdar.com)
__mharrison__ 3 hours ago [-]
Great writeup.

I've been in the pandas (and now polars world) for the past 15 years. Staying in the sandbox gets most folks good enough performance. (That's why Python is the language of data science and ML).

I generally teach my clients to reach for numba first. Potentially lots of bang for little buck.

One overlooked area in the article is running on GPUs. Some numpy and pandas (and polars) code can get a big speedup by using GPUs (same code with import change).

bloaf 1 hours ago [-]
Taichi, benchmarked in the article, claims to be able to outperform CUDA at some GPU tasks, although their benchmarks look to be a few years old:

https://github.com/taichi-dev/taichi_benchmark

pjmlp 59 minutes ago [-]
And doesn't account for cuTitle, NVidia's new API infrastructure that supports writing CUDA directly in Python via a JIT that is based on MLIR.
Ralfp 3 hours ago [-]

    CPython 3.13 went further with an experimental copy-and-patch JIT compiler -- a lightweight JIT that stitches together pre-compiled machine code templates instead of generating code from scratch. It's not a full optimizing JIT like V8's TurboFan or a tracing JIT like PyPy's;
Good news. Python 3.15 adapts Pypy tracing approach to JIT and there are real performance gains now:

https://github.com/python/cpython/issues/139109

https://doesjitgobrrr.com/?goals=5,10

josalhor 3 hours ago [-]
While this is great, I expected faster CPython to eventually culminate into what YJIT for Ruby is. I'm not sure the current approaches they are trying will get the ecosystem there.
repple 44 minutes ago [-]
Significant AI smell in this write up. As a result, my current reflex is to immediately stop reading. Not judgement on the actual analysis and human effort which went in. It’s just that the other context is missing.
jb_hn 6 minutes ago [-]
I didn't notice any signs of AI writing until seeing this comment and re-reading (though I did notice it on the second pass).

That said, I think this article demonstrates that focusing on whether or not an article used AI might be focusing on the wrong “problem.” I appreciate being sensitive to the "smell" (the number of low-effort, AI posts flying around these days has made me sensitive too), but personally, I found this article both (1) easy to read and (2) insightful. I think the number of AI-written content lacking (2) is the problem.

MonkeyClub 17 minutes ago [-]
I got the same sense, but nowadays I can't be sure whether a text is AI or the writer's style has absorbed LLM tropes.
rusakov-field 2 hours ago [-]
Python is perfect as a "glue" language. "Inner Loops" that have to run efficiently is not where it shines, and I would write them in C or C++ and patch them with Python for access to the huge library base.

This is the "two language problem" ( I would like to hear from people who extensively used Julia by the way, which claims to solve this problem, does it really ?)

pjmlp 1 hours ago [-]
This problem has been solved already by Lisp, Scheme, Java, .NET, Eiffel, among others, with their pick and choose mix of JIT and AOT compiler toolchains and runtimes.
blt 1 hours ago [-]
Surprised Python is only 21x slower than C for tree traversal stuff. In my experience that's one of the most painful places to use Python. But maybe that's because I use numpy automatically when simple arrays are involved, and there's no easy path for trees.
tweakimp 37 minutes ago [-]
Be careful with that, numpy arrays can be slower than Python tuples for some operations. The creation is always slower and the overhead has to be worth it.
seanwilson 3 hours ago [-]
> The real story is that Python is designed to be maximally dynamic -- you can monkey-patch methods at runtime, replace builtins, change a class's inheritance chain while instances exist -- and that design makes it fundamentally hard to optimize. ...

> 4 bytes of number, 24 bytes of machinery to support dynamism. a + b means: dereference two heap pointers, look up type slots, dispatch to int.__add__, allocate a new PyObject for the result (unless it hits the small-integer cache), update reference counts.

Would Python be a lot less useful without being maximally dynamic everywhere? Are there domains/frameworks/packages that benefit from this where this is a good trade-off?

I can't think of cases in strong statically typed languages where I've wanted something like monkey patching, and when I see monkey patching elsewhere there's often some reasonable alternative or it only needs to be used very rarely.

bloaf 58 minutes ago [-]
I've always thought the flexibility should allow python to consume things like gRPC proto files or OpenAPI docs and auto-generate the classes/methods at runtime as opposed to using codegen tools. But as far as I know, there aren't any libraries out there actually doing that.
NeutralForest 1 hours ago [-]
There are some use cases for very dynamic code, like ORMs; with descriptors you can add attributes + behavior at runtime and it's quite useful. Anyways, breaking metaprogramming and more dynamic features would mean python 4 and we know how 2 -> 3 went. I also don't think it's where the core developers are going. Also also, there are other things I'd change before going after monkey patching like some scoping rules, mutable defaults in function attributes, better async ergonomics, etc.
LtWorf 2 hours ago [-]
I've used a library that patches the zipfile module to add support for zstd compression in zipfiles.

In python3.14 the support is there, but 2 years ago you could just import this library and it would just work normally.

pjmlp 1 hours ago [-]
Kudos for going through all the existing JIT approaches, instead of reaching for rewrite into X straight away.

However if Rust with PyO3 is part of the alternatives, then Boost.Python, cppyy, and pybind11 should also be accounted for, given their use in HPC and HFT integrations.

superlopuh 1 hours ago [-]
Missing Muna[0][1], I'm curious how it would compare on these benchmarks.

[0]: https://www.muna.ai/ [1]: https://docs.muna.ai/predictors/create

Mawr 26 minutes ago [-]
Shockingly good article — correct identification of the root cause of performance issues being excessive dynamism and ranking of the solutions based on the value/effort ratio. Excellent taste. Will keep this in my back pocket as a quick Python optimization reference.

It's just somewhat unfortunate that I have to question every number and fact presented since the writing was clearly at least somewhat AI-assisted with the author seemingly not being upfront about that at all.

kelvinjps10 2 hours ago [-]
Great post saved it for when I need to optimize my python code
retsibsi 2 hours ago [-]
A personal opinion: I would much prefer to read the rough, human version of this article than this AI-polished version. I'm interested in the content and the author clearly put thought and effort into it, but I'm constantly thrown out of it by the LLM smell. (I'm also a bit mad that `--` is now on the em dash treadmill and will soon be unusable.)

I'm not just saying this to vent. I honestly wonder if we could eventually move to a norm where people publish two versions of their writing and allow the reader to choose between them. Even when the original is just a set of notes, I would personally choose to make my own way through them.

arlattimore 2 hours ago [-]
What a great article!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 16:18:52 GMT+0000 (Coordinated Universal Time) with Vercel.