Another problem brought about by their design being backwards-compatible with tuples is that you get wonky equality rules where two namedtuples of different types and with differently-named attributes can compare as equal:
This also happens with the "new-style" namedtuples (typing.NamedTuple).
I like the convenience of namedtuples but I agree with the author: there are enough footguns to prefer other approaches.
mont_tag 51 days ago [-]
This seems like an invented problem that never comes up in practice. It is no more interesting than numpy arrays evaluating as equal even when they conceptually not comparable:
I've usually seen it come up when people try to hash the objects to use as dictionary keys or in sets, and then encounter very hard-to-troubleshoot issues later on. Obviously it's a bit weird to hash a bunch of objects of different types, but it's just one example of the footguns that namedtuples have and why I prefer other approaches.
quotemstr 51 days ago [-]
The numpy equality thing is actually an enormous footgun, especially for people new to numeric Python. The equality for numpy and its derivates should have had the traditional Python meaning (yielding a bool), with the current operation (yielding a mask) should have been put under a named method.
51 days ago [-]
beng-nl 51 days ago [-]
Not that I find your argument invalid, but you’re not arguing against GP.
At risk of belaboring the point or being redundant, GP is making the point that the Celsius vs Fahrenheit meaning of the arrays makes them Semantically different and therefore the equality could Be taken to be misleading when this is taken into account. GP thinks this is nonsense and draws a parallel with named tuples.
xg15 51 days ago [-]
Counterpoint: Named tuples are immutable, while dataclasses are mutable by default.
You can use frozen=true to "simulate" immutability, but that just overwrites the setter with a dummy implementation, something you (or your very clever coworker) can circumvent by using object.__setattr__()
So you neither get the performance benefits nor the invariants of actual immutability.
hansvm 51 days ago [-]
Counter-counterpoint:
- Everything in Python is mutable, including the definitions of constants like `3` and `True`. It's much like "unsafe" in Rust; you can do stupid things, but when you see somebody reaching for `__setattr__` or `ctypes` then you know to take out your magnifying glass on the PR, find a better solution, ban them from the repo, or start searching for a new job.
- Performance-wise, named tuples are sometimes better because more work happens in C for every line of Python, not because of any magic immutability benefits. It's similar to how you should prefer comprehensions to loops (most of the time) if you're stuck with Python but performance still matters a little bit. Yes, maybe still use named tuples for performance reasons, but don't give the credit to immutability.
notpushkin 51 days ago [-]
> something you (or your very clever coworker) can circumvent by using object.__setattr__()
This fits pretty well with a lot of other stuff in Python (e.g. there’s no real private members in classes). There’s a bunch of escape hatches that you should avoid (but that can still be useful sometimes), and those usually are pretty obvious (e.g. if you see code using object.__setattr__, something is definitely not right).
Can’t tell whether this is good design or not, but personally I like it.
jfktrey 51 days ago [-]
Counterpoint: I've used `object.__setattr__` pretty often when setting values in the `__post_init__` of frozen dataclasses
xg15 51 days ago [-]
I'd argue, that's about the only correct usage of this stuff.
eyegor 51 days ago [-]
Is there a difference between global setattr(object, v) and object.__setattr__(v)? I've seen setattr() in the wild all over but I've never encountered the dunder one.
notpushkin 51 days ago [-]
Note that `object` here is not a placeholder variable but actually refers to the global object type (basically a superclass of pretty much every other type in Python). It allows you to bypass the classes’ __setattr__ and set the value regardless (the setattr() function can’t do that):
In [1]: from dataclasses import dataclass
In [2]: @dataclass(frozen=True)
...: class Foo:
...: a: int
...:
In [3]: foo = Foo(5)
In [4]: foo.a = 10
FrozenInstanceError: cannot assign to field 'a'
In [5]: setattr(foo, "a", 10)
FrozenInstanceError: cannot assign to field 'a'
In [6]: object.__setattr__(foo, "a", 10)
In [7]: foo.a
Out[7]: 10
quotemstr 51 days ago [-]
It's Python. You can override practically any behavior. Hell, use ctypes and mutate immutable tuples! Doing so is well-defined in the C API!
What bugs me more about frozen dataclasses is how post-init methods have to use the setattr hack.
webprofusion 51 days ago [-]
Oh you mean Python library APIs. I totally thought this was going to be a generic article about APIs delivered over http, the first thing I'd think of when someone says API.
tomrod 51 days ago [-]
Yeah, having spent the last few years in REST world I sort of thought the same thing.
8n4vidtmkvmk 51 days ago [-]
My manager called the class I was about to define an API about 6 years ago and I couldn't refute it. It is a form of API. My definition was suddenly expanded.
fingerlocks 51 days ago [-]
Depending on the context, it’s probably more accurate to call that an ABI
8n4vidtmkvmk 50 days ago [-]
Abi is the binary interface, no? This was within the same compilation unit
lmm 51 days ago [-]
[dead]
heavyset_go 51 days ago [-]
Author could have used NamedTuple instead of dataclass or TypedDict:
from typing import NamedTuple
class Point(NamedTuple):
x: int
y: int
z: int
I don't see "don't use namedtuples in APIs" as a useful rule of thumb, to be honest. Ordered and iterable return-types make sense for a lot of APIs. Use them where it makes sense.
rtpg 51 days ago [-]
I feel like "consider dataclasses as a useful default" is decent advice.
You get the stuff you get from `NamedTuple`, but you can also easily add helper methods to the class as needed. And there are other dataclass goodies (though some things I find to be a bit anti-feature-y).
heavyset_go 51 days ago [-]
> I feel like "consider dataclasses as a useful default" is decent advice.
I agree.
> You get the stuff you get from `NamedTuple`, but you can also easily add helper methods to the class as needed. And there are other dataclass goodies (though some things I find to be a bit anti-feature-y).
I've seen examples where dataclasses were used when order matters, however, hence why I'm not comfortable with a general rule against namedtuples. Sometimes order and iterability matter, and dogmatically reaching for a different data type that doesn't preserve that information might be the wrong choice.
david2ndaccount 51 days ago [-]
You can have methods with NamedTuple
rtpg 51 days ago [-]
Oh, with the class declaration version? I never considered that, but feels obvious now.
d0mine 51 days ago [-]
The point of the article is that: do not return objects that are sequences too when a struct would suffice.
Though the provided options are not immutable and it is much more important than whether you leak o[0] access. dataclass(frozen=True) is the only acceptable alternative.
notpushkin 51 days ago [-]
Author argues that named tuples are bad (it’s literally the article title) so I think you miss the point?
heavyset_go 51 days ago [-]
The author claims that the reason people reach for namedtuples is brevity, but I'd argue to the contrary and include the modern syntax for defining a namedtuple. The syntax is nearly identical to TypedDict and dataclass.
There are other reasons to reach for namedtuples, for example when order or iterability matter. I don't see a general rule of not using namedtuples in APIs to be that useful. Use the right tool when the need calls for it.
mont_tag 51 days ago [-]
Right. It takes equal effort to define a named tuple, typed dict or dataclass.
Mostly people just reach for the tool that does what they want, named tuples for tuply stuff, typed dicts for mapping applications, and dataclasses when you actually want a class.
PittleyDunkin 51 days ago [-]
The author also argues for readability over semantics, so I'm not sure they got the point to begin with.
the__alchemist 51 days ago [-]
I think the best option for this, which is one listed in the article, is the dataclass. It's like a struct in C or Rust. It's ideal for structured data, which is, I believe, what a named tuple is intended for.
o11c 51 days ago [-]
The annoyance of dataclasses, of course, is that they interact very awkwardly with immutability, which much of the Python ecosystem mandates (due to lacking value semantics).
But yes, they're still the least-bad choice.
the__alchemist 51 days ago [-]
Valid. One of my biggest (Perhaps my #1) fault with python is sloppy mutability and pass-by-value/reference rules.
Spivak 51 days ago [-]
Is there a situation where Python ever passes by value? Like you can sort of pretend for primitive types but I can't think of case where it's actually value.
o11c 51 days ago [-]
Non-CPython implementations may pass by value as an optimization for certain immutable builtin types. This is visible to code using `is`.
(It's surprisingly difficult to implement a rigorous way to detect this vs compile-time constant evaluation though; note that identical objects of certain types are pooled already when generating/loading bytecode files. I don't think any current implementation is smart enough to optimize the following though)
$ python3 -c 'o = object(); print(id(o) is id(o))'
False
$ pypy3 -c 'o = object(); print(id(o) is id(o))'
True
$ jython -c 'o = object(); print(id(o) is id(o))'
True
ericvsmith 51 days ago [-]
What you're seeing here is an optimization about integers, not about pass by value. CPython only does this for small integers:
$ python -c "print(int('3') is int('3'))"
True
$ python -c "print(int('300') is int('300'))"
False
Other implementations make different choices.
cwalv 51 days ago [-]
Are you saying `o` is passed by value? I think this behavior is due to the return from `id()` being interned, or not. `id(o) == id(o)` will be true in all cases
o11c 51 days ago [-]
I mean that the `id` function returns by value. It's not interning since that explicitly refers to something allocated, which isn't the case here.
ericvsmith 51 days ago [-]
This is incorrect. The returned integer is a regular Python object, not some "unboxed" integer value.
nomel 51 days ago [-]
> is that they interact very awkwardly with immutability
How so?
Tuples are awkward with immutability, if you put mutable things inside them.
o11c 51 days ago [-]
By default, dataclasses can't be used as keys in a `dict`. You have to either use `frozen` (in which case the generated `__init__` becomes an abomination) or use `unsafe_hash` (in which case you have no guardrails).
In languages with value semantics, nothing about this problem even makes sense, since obviously a dict's key is taken by value if it needs to be stored.
--
Tuple behavior is sensible if you are familiar with discussion of reference semantics (though not as much as if you also support value semantics).
Still, at least we aren't Javascript where the question of using composite keys is answered with "screw you".
porridgeraisin 51 days ago [-]
Haha
obj[JSON.stringify(x)] = y
* Ducks *
o11c 51 days ago [-]
x = [Symbol('geese')]
math_dandy 51 days ago [-]
One advantage of (Named)Tuples over dataclasses or SimpleNamespaces is that they can be used as indices into numpy arrays, very useful when you API is returning a point or screen coordinates or similar.
mont_tag 51 days ago [-]
This article seems vacuous to me. It misses the point that tuples are fundamental to the language with c-speed native support for packing, unpacking, hashing, pickling, slicing and equality tests. Tuples appear everywhere from the output of doctest, to time tuples, the result of divmod, the output of a csv reader and the output of a sqlite3 query.
Tuples are a core concept and fundamental data aggregation tool for Python. However, this post uses a trivial `Point()` class strawman to try to shoot down the idea of using tuples at all. IMO that is fighting the language and every existing API that either accepts tuple inputs or returns tuple outputs. That is a vast ecosystem.
According the glossary a named tuple "any type or class that inherits from tuple and whose indexable elements are also accessible using named attributes." Presumably, no one disputes that having names improves readability. So really this weak post argues against tuples themselves.
quotemstr 51 days ago [-]
The beauty of Python is that it's so slow that you can relax and use what's clearest and most expensive. Finding yourself micro-optimizing things like tuple allocation time is a signal that you should be writing an extension or a numba snippet or something.
CaliforniaKarl 51 days ago [-]
I think the core issue is about trust.
I trust that the maintainers of the Python language & the Python Standard Library are not going to change their tuple-using APIs in a breaking way, without a clear signal (like a major-version bump).
I do not extend that same trust to other Python projects. Maybe I extend that same trust to projects that demonstrate proper use of Semantic Versioning, but not to others.
Using something other than tuples trades some performance for some stability, which is a trade I’m OK with.
Spivak 51 days ago [-]
I think for the same reason you should avoid TypedDicts for new APIs as well. Dataclasses are the natural replacement for both.
mont_tag 51 days ago [-]
Not really. A lot of tooling, JSON for example, naturally works with dictionaries. A TypedDict naturally connects with all those tools. In contrast, dataclasses are hostile to the enormous ecosystem of tools that work with dictionaries.
If you store all your data is dataclasses, you end-up having to either convert back to dictionaries or having to rebuild all that tooling. Python's abstract syntax trees are an example. If nodes had been represented with native dictionaries, then pprint would work right out the box. But with every node being its own class, a custom pretty printer is needed.
Dataclasses are cool but people should have a strong preference for Python's native types: list, tuple, dict, and set. Those work with just about everything. In contrast, a new dataclass is opaque and doesn't work with any existing tooling.
designed 51 days ago [-]
An advantage of dataclasses over dicts is that you can add methods and properties.
Also you can easily convert a dataclass to a dict with dataclasses.asdict. Not so easy to go from dict to dataclass though
ReflectedImage 51 days ago [-]
That's what a class is for.
Spivak 51 days ago [-]
Right but that's @dataclass. Being a replacement for classes
in commonly used situations is one of its design goals.
Joker_vD 51 days ago [-]
> This leads to writing tests for both ways of accessing your data, not just one of them. And you shouldn't skimp on this
Or you can just keep returning namedtuple instead of something else, because then you absolutely can skimp on testing whether what you return does, in fact, satisfies the namedtuple's interface.
ReflectedImage 51 days ago [-]
namedtuple is preferable as it's the more Pythonic solution. Simpler is better.
eesmith 51 days ago [-]
namedtuple brings in likely inappropriate complexity, so it is not always simpler.
Consider the object returned by os.stat. It allows getting terms by index:
Which means new fields are not accessible by indexing, only attribute lookup:
>>> os.stat("/dev/null").st_blksize
65536
Assuming there wasn't the historical baggage which made os.stat the way it is, why would namedtuple still be the more Pythonic solution, and simpler than a frozen dataclass?
(Historically, os.stat originally returned a tuple, then migrated to os.stat_result and named attributes, both for readability and to allow new fields, but keeping indexing for backwards compatibility support.)
ReflectedImage 51 days ago [-]
Because namedtuple is a simpler construct than frozen dataclass therefore it is always preferable in Python.
The number of characters and lines to access the members of namedtuple is significantly less than dataclass.
dataclass would have similar issues if st_dev was now defined as a string whereas namedtuple would not.
Whilst there maybe edge cases in using simpler constructs, in scripting languages we accept the edge cases as it pays off in 99.99% of cases to simply ignore them. If something goes wrong, you catch the exception "Ask for forgiveness not for permission" if you want to look up the concept.
ericvsmith 51 days ago [-]
I think you're saying that it takes fewer characters to define a namedtuple. If you're interested in less typing, There's also dataclasses.make_dataclass:
The Pythonic "simpler" does not mean "fewer characters" otherwise APL or Perl/Raku would be more Pythonic.
Namedtuple is not strictly simpler. It implements additional features which would not be in an frozen dataclass, which makes "simpler" a personal bias.
I've been using Python continuously since 1998 and well remember the "look before you leap" vs. "ask for forgiveness not for permission" debate from the bygone comp.lang.python forum. And I learned the concept of "easier to ask for forgiveness" from the 60 Minutes interview with Grace Hopper back in the 1980s.
That have nothing to do with this issue, which is one of taking on an API burden without due consideration simply because it's less typing.
personjerry 51 days ago [-]
I feel like get mouse coordinates is a perfect time to return a named tuple though?
Lvl999Noob 51 days ago [-]
Yes. That was a positive case for NamedTuple. The negative case was what if the function needs to grow further and return more stuff and then its no longer clear what the return values are? For example, what if `get_mouse_coordinates()` becomes `get_peripheral_coordinates()` which, for some reason, needs to return the coordinates of all the peripherals as one flat namedtuple `NamedTuple(mouse_x: int, mouse_y: int, pointer_x: int, pointer_y: int, ...)`. I know its a contrived example but it can happen for other kinds of functions.
doctorpangloss 51 days ago [-]
Data classes can gracefully replace tuples everywhere. Set frozen, then use a mixin or just author a getitem and iter magic, and you’re done.
class IWantToHaveNamedTupleInterfaceButWasToldTheyAreBad:
def __iter__(self):
return iter(self.__dict__.values())
and voila:
@dataclass(frozen=True)
class Point(IWantToHaveNamedTupleInterfaceButWasToldTheyAreBad):
x: int
y: int
p = Point(1, 2)
x, y = p
heavyset_go 51 days ago [-]
Yes, but that is insane.
awinter-py 51 days ago [-]
these can be more memory-efficient than classes or dictionaries.
there was a point a while back where python added __slots__ to classes to help with this; and in practice these days the largest systems are using numpy if they're in python at all
not sure what modern versions do. but in the olden days, if you were creating lots of small objects, tuples were a low-overhead way to do it
solarkraft 51 days ago [-]
> But there are three more ways to do the same data structure
Thanks, I hate it. There’s a lot I like about Python, but this is a major pain point.
NamedTuple, TypedDict, Dataclass, Record ... Remember the Zen of Python? „There should be one-- and preferably only one --obvious way to do it“ - it feels like Python has gone way overboard with ways to structure data.
In Javascript everything is an object, you can structurally type them with Typescript and I don’t feel like I’m missing much.
pipeline_peak 50 days ago [-]
You’d think a much an easy to use high level language would have:
Point(x,y,z)
Rendered at 11:19:38 GMT+0000 (Coordinated Universal Time) with Vercel.
I like the convenience of namedtuples but I agree with the author: there are enough footguns to prefer other approaches.
At risk of belaboring the point or being redundant, GP is making the point that the Celsius vs Fahrenheit meaning of the arrays makes them Semantically different and therefore the equality could Be taken to be misleading when this is taken into account. GP thinks this is nonsense and draws a parallel with named tuples.
You can use frozen=true to "simulate" immutability, but that just overwrites the setter with a dummy implementation, something you (or your very clever coworker) can circumvent by using object.__setattr__()
So you neither get the performance benefits nor the invariants of actual immutability.
- Everything in Python is mutable, including the definitions of constants like `3` and `True`. It's much like "unsafe" in Rust; you can do stupid things, but when you see somebody reaching for `__setattr__` or `ctypes` then you know to take out your magnifying glass on the PR, find a better solution, ban them from the repo, or start searching for a new job.
- Performance-wise, named tuples are sometimes better because more work happens in C for every line of Python, not because of any magic immutability benefits. It's similar to how you should prefer comprehensions to loops (most of the time) if you're stuck with Python but performance still matters a little bit. Yes, maybe still use named tuples for performance reasons, but don't give the credit to immutability.
This fits pretty well with a lot of other stuff in Python (e.g. there’s no real private members in classes). There’s a bunch of escape hatches that you should avoid (but that can still be useful sometimes), and those usually are pretty obvious (e.g. if you see code using object.__setattr__, something is definitely not right).
Can’t tell whether this is good design or not, but personally I like it.
What bugs me more about frozen dataclasses is how post-init methods have to use the setattr hack.
You get the stuff you get from `NamedTuple`, but you can also easily add helper methods to the class as needed. And there are other dataclass goodies (though some things I find to be a bit anti-feature-y).
I agree.
> You get the stuff you get from `NamedTuple`, but you can also easily add helper methods to the class as needed. And there are other dataclass goodies (though some things I find to be a bit anti-feature-y).
I've seen examples where dataclasses were used when order matters, however, hence why I'm not comfortable with a general rule against namedtuples. Sometimes order and iterability matter, and dogmatically reaching for a different data type that doesn't preserve that information might be the wrong choice.
Though the provided options are not immutable and it is much more important than whether you leak o[0] access. dataclass(frozen=True) is the only acceptable alternative.
There are other reasons to reach for namedtuples, for example when order or iterability matter. I don't see a general rule of not using namedtuples in APIs to be that useful. Use the right tool when the need calls for it.
Mostly people just reach for the tool that does what they want, named tuples for tuply stuff, typed dicts for mapping applications, and dataclasses when you actually want a class.
But yes, they're still the least-bad choice.
(It's surprisingly difficult to implement a rigorous way to detect this vs compile-time constant evaluation though; note that identical objects of certain types are pooled already when generating/loading bytecode files. I don't think any current implementation is smart enough to optimize the following though)
How so?
Tuples are awkward with immutability, if you put mutable things inside them.
In languages with value semantics, nothing about this problem even makes sense, since obviously a dict's key is taken by value if it needs to be stored.
--
Tuple behavior is sensible if you are familiar with discussion of reference semantics (though not as much as if you also support value semantics).
Still, at least we aren't Javascript where the question of using composite keys is answered with "screw you".
Tuples are a core concept and fundamental data aggregation tool for Python. However, this post uses a trivial `Point()` class strawman to try to shoot down the idea of using tuples at all. IMO that is fighting the language and every existing API that either accepts tuple inputs or returns tuple outputs. That is a vast ecosystem.
According the glossary a named tuple "any type or class that inherits from tuple and whose indexable elements are also accessible using named attributes." Presumably, no one disputes that having names improves readability. So really this weak post argues against tuples themselves.
I trust that the maintainers of the Python language & the Python Standard Library are not going to change their tuple-using APIs in a breaking way, without a clear signal (like a major-version bump).
I do not extend that same trust to other Python projects. Maybe I extend that same trust to projects that demonstrate proper use of Semantic Versioning, but not to others.
Using something other than tuples trades some performance for some stability, which is a trade I’m OK with.
If you store all your data is dataclasses, you end-up having to either convert back to dictionaries or having to rebuild all that tooling. Python's abstract syntax trees are an example. If nodes had been represented with native dictionaries, then pprint would work right out the box. But with every node being its own class, a custom pretty printer is needed.
Dataclasses are cool but people should have a strong preference for Python's native types: list, tuple, dict, and set. Those work with just about everything. In contrast, a new dataclass is opaque and doesn't work with any existing tooling.
Also you can easily convert a dataclass to a dict with dataclasses.asdict. Not so easy to go from dict to dataclass though
Or you can just keep returning namedtuple instead of something else, because then you absolutely can skimp on testing whether what you return does, in fact, satisfies the namedtuple's interface.
Consider the object returned by os.stat. It allows getting terms by index:
But that list of 10 values became locked because people do: Which means new fields are not accessible by indexing, only attribute lookup: Assuming there wasn't the historical baggage which made os.stat the way it is, why would namedtuple still be the more Pythonic solution, and simpler than a frozen dataclass?(Historically, os.stat originally returned a tuple, then migrated to os.stat_result and named attributes, both for readability and to allow new fields, but keeping indexing for backwards compatibility support.)
The number of characters and lines to access the members of namedtuple is significantly less than dataclass.
dataclass would have similar issues if st_dev was now defined as a string whereas namedtuple would not.
Whilst there maybe edge cases in using simpler constructs, in scripting languages we accept the edge cases as it pays off in 99.99% of cases to simply ignore them. If something goes wrong, you catch the exception "Ask for forgiveness not for permission" if you want to look up the concept.
Namedtuple is not strictly simpler. It implements additional features which would not be in an frozen dataclass, which makes "simpler" a personal bias.
I've been using Python continuously since 1998 and well remember the "look before you leap" vs. "ask for forgiveness not for permission" debate from the bygone comp.lang.python forum. And I learned the concept of "easier to ask for forgiveness" from the 60 Minutes interview with Grace Hopper back in the 1980s.
That have nothing to do with this issue, which is one of taking on an API burden without due consideration simply because it's less typing.
there was a point a while back where python added __slots__ to classes to help with this; and in practice these days the largest systems are using numpy if they're in python at all
not sure what modern versions do. but in the olden days, if you were creating lots of small objects, tuples were a low-overhead way to do it
Thanks, I hate it. There’s a lot I like about Python, but this is a major pain point.
NamedTuple, TypedDict, Dataclass, Record ... Remember the Zen of Python? „There should be one-- and preferably only one --obvious way to do it“ - it feels like Python has gone way overboard with ways to structure data.
In Javascript everything is an object, you can structurally type them with Typescript and I don’t feel like I’m missing much.
Point(x,y,z)