Not bad, an article aiming to popularize product and sum types. I am not the target audience, but wished there was a way to get the masses to understand that elementary type theory and logic are at the heart of these programming concepts that remain the same across languages.
Back when I worked on translation of pattern matching in the Scala compiler (pre Kotlin, pre "Java 5" with generics..) I surely hoped but could not imagine that the world would turn out like this, with python, Java, Rust all supporting matching and sum types.
I don't see many texts that bridge type theory (mathematical logic) and data modeling, but I am certainly rooting for sum and product types to become pervasive common knowledge to all programmers one day.
js8 1 days ago [-]
> I am certainly rooting for sum and product types to become pervasive common knowledge to all programmers one day
I agree. And it's not just programming, but also databases. One of the flaws of relational data model is that sum types are awkward to express - typically this is done through inheritance.
juancn 1 days ago [-]
The sum/product dichotomy is very confusing. Sums and products are operations in muggle math (functions), but in the example a product is a data type? It feels off.
It's not intuitive at all and so far removed from normal programming practice, the terminology doesn't help.
It's the overload of concepts that makes it harder to grasp.
mrkeen 1 days ago [-]
It's nice when you can use common, general language to talk between languages.
I made a joke about Java finally getting 'structs' [1] and was told that records weren't structs. I don't think I would have gotten the same pushback if I just called them product types instead.
> It's the overload of concepts that makes it harder to grasp.
Have you come across the OR-PLUS similarity in predicate logic, as well as AND-MULTIPLY? Those ones help me think about logic even though they don't have anything to do with each other. It helps me parse their precedence in my mind, and think about their commutativity/distributivity quickly, e.g.
P ∧ (Q ∨ R)
could be expanded to
P ∧ Q ∨ P ∧ R
which is the same as
(P ∧ Q) ∨ (P ∧ R)
> Sums and products are operations in muggle math (functions)
Anyway, these ones refer to adding up their state spaces (number of possible inhabitants). A sum type can have this inhabitant (TRUE) or [2] that inhabitant (FALSE). A product type can have this type (e.g. first element in a tuple) and [3] that type (second element in a tuple). And since you mentioned "functions" - those are exponential types!
[1] https://docs.oracle.com/en/java/javase/17/language/records.html
[2] Oh, there's that "or" again lol!
[3] "and !
js8 1 days ago [-]
Actually, sum type would be more analogous to XOR in Boolean logic.
abeppu 1 days ago [-]
I think it lines up with OR in the sense that
x:(S+T) <==> x:S OR x:T
If you know x:S you may find conclude x:(S+T) without also stipulating that x is not a T. It may be permitted (if pointless) to use S+T even in contexts where S is a subtype of T (and hence the XOR is not true).
ndriscoll 1 days ago [-]
OR and XOR are both "wrong". It's a disjoint union. T+T = 2T, not T or 0. When S and T are disjoint, then it's OR-like (and also XOR-like).
You can't conclude x: S+T from x:S (because it's false). What you can do is use x to build an S+T. If S=T there are two ways to do that.
From what I've seen, usually subtyping is also defined via embeddings, and if x: T then x: S is false for all S != T by definition.
taeric 1 days ago [-]
What makes records not structs? Seems a fair statement.
fjuwnsd 1 days ago [-]
Because most devs don't think from type theory principles.
They see the keyword "struct" and associate it to C# struct which is stack allocated.
Then they see the keyword "record" and think Java record which is syntax sugar for class which is heap allocated.
They never draw the connection that they are both ways to express product types.
taeric 1 days ago [-]
I still don't necessarily follow. I would expect them to be structs in the LISP DEFSTRUCT sense. Outside of somewhat fringe reasons, caring about allocation style is something I'd imagine most people just don't do?
ndriscoll 1 days ago [-]
Whether something is boxed or not is relevant in practice for performance. It's reasonable to have a jargon where products, records, and structs are distinct concepts.
ndriscoll 1 days ago [-]
Sums and products are also operations in non-muggle math, e.g. for vector spaces if you take a proper linear algebra course. The meaning in type theory is consistent with that usage. Even in muggle math, e.g. matrix operations are an overloading. In fact overloading is extremely common and generally good practice (when there's an analogy to be had) in math.
There are also connections to muggle math/arithmetic. e.g. for types as for numbers, A^(X+Y)=A^X*A^Y. For types, this says a function that takes a sum works by doing case analysis. You can also "forget" things are types and pretend A, X, Y are the number of possible values the types can have, and your equations about types become true statements about numbers.
pdpi 1 days ago [-]
Let's say you have a type L (for letters) with values A and B, and you have another type N (for numbers) with values 1, 2 and 3.
The sum type L + N is composed of the values A, B, 1, 2, 3. It's a set union, and the cardinality of the union is the sum of the two cardinalities.
Likewise, the product type L x N has the values (A, 1), (A, 2), (A, 3), (B, 1), (B, 2), and (B, 3). It's the usual cartesian product of sets, and you get the cardinality of that type by multiplying the two cardinalities.
(For bonus points: The type of functions L -> N is an exponential type, with cardinality N^L)
rkallos 1 days ago [-]
I find it very intuitive, but maybe it's because of a missing link or two.
Boolean arithmetic sometimes uses + for OR and * for AND. When considering only zero and non-zero values, the semantics are the same. 1 + 1 + 0 is non-zero, 1 * 1 * 0 is zero. Substitute + for OR and * for AND, and you wind up with the "same" result.
From the linked article:
> one is „and data“, the other is „or data“.
Structs/records/product types fit the description of "and data", because it's a compound value comprised of this type AND that type AND that other type, etc.
Unions/sum types fit the description of "or data" because it's either this type OR that type OR that other type.
gugagore 1 days ago [-]
I don't think "type unions" correspond neatly to sums. When I think of a type union in mypy, typescript, Julia, etc, those languages have Union[int, int] as equal to int.
But int + int is isomorphic to bool*int.
The unions you're talking about necessarily require a notion of subtyping, I believe.
yen223 1 days ago [-]
Yeah, it is very unfortunate that languages that have sum types each call them slightly different things:
- sealed classes/interfaces
- discriminated unions
- enums
- case classes
It would be nice if we could agree on a single term to describe this, the same way we standardised on "class" to describe, well, classes.
louthy 1 days ago [-]
> the same way we standardised on "class"
Did we, when did this happen? Languages like Haskell use type-class to mean something closer to traits/interfaces in other languages. I wouldn't be surprised if other languages have their own definition of what 'class' means.
> to describe, well, classes.
What is a class? Not trying to be an arse here, genuinely, what is it? In OO languages it would be a type with a collection of named-members. How is that different to a record with fields that are either data-types or first-class functions? Is inheritance implied? If so, what's a 'sealed class'?
What does 'class' really mean? Classification? Doesn't the type-name alone classify something? Really, 'class' is the weird one imho; it's just everyone's so used to it now that it seems correct. It's similar with other OO terminology like 'object'. On the surface it seems sensible, but when you really think about it, it's kinda odd.
Ultimately, we all just end up being most comfortable with what we know.
_dain_ 1 days ago [-]
It's about the cardinality (number of possible values) of the compound type.
The cardinality of a product type is the product of the cardinalities of its constituents. Likewise a sum type's cardinality is the sum of the cardinalities of its constituents.
Examples:
- a struct with a boolean and an 8-bit integer as its members. A boolean has 2 possible values, the integer has 256 possible values. So the struct has 2x256 = 512 values in total. So a struct is a product type.
- an Optional<int8> is either Some(n: int8) or None. The first one has 256 possible values, the second has one possible value. So the Optional<int8> has 256+1 = 257 values in total. So Optional<...> is a sum type.
Why does this matter? Because when modelling problems it's very nice if you come up with a representation where the number of legal states in the domain exactly matches the number of states of the type. It's awkward and confusing to have possible states of the type that are redundant or don't correspond to anything real. e.g. Golang's error handling uses what is morally a product type like (value, err) := foo(). What does it mean if both value and err are non-nil? That's prohibited only by convention, not the language. The type is "too big". But with sum types we can have Result<value, error>, where it can be a value or an error but not both, guaranteed by the compiler.
marcosdumay 1 days ago [-]
You do sum types in SQL the same way you do everything else. You create more tables, and add relationships.
Rendered at 21:34:05 GMT+0000 (Coordinated Universal Time) with Vercel.
Back when I worked on translation of pattern matching in the Scala compiler (pre Kotlin, pre "Java 5" with generics..) I surely hoped but could not imagine that the world would turn out like this, with python, Java, Rust all supporting matching and sum types.
I don't see many texts that bridge type theory (mathematical logic) and data modeling, but I am certainly rooting for sum and product types to become pervasive common knowledge to all programmers one day.
I agree. And it's not just programming, but also databases. One of the flaws of relational data model is that sum types are awkward to express - typically this is done through inheritance.
It's not intuitive at all and so far removed from normal programming practice, the terminology doesn't help.
It's the overload of concepts that makes it harder to grasp.
I made a joke about Java finally getting 'structs' [1] and was told that records weren't structs. I don't think I would have gotten the same pushback if I just called them product types instead.
> It's the overload of concepts that makes it harder to grasp.
Have you come across the OR-PLUS similarity in predicate logic, as well as AND-MULTIPLY? Those ones help me think about logic even though they don't have anything to do with each other. It helps me parse their precedence in my mind, and think about their commutativity/distributivity quickly, e.g.
> Sums and products are operations in muggle math (functions)Anyway, these ones refer to adding up their state spaces (number of possible inhabitants). A sum type can have this inhabitant (TRUE) or [2] that inhabitant (FALSE). A product type can have this type (e.g. first element in a tuple) and [3] that type (second element in a tuple). And since you mentioned "functions" - those are exponential types!
If you know x:S you may find conclude x:(S+T) without also stipulating that x is not a T. It may be permitted (if pointless) to use S+T even in contexts where S is a subtype of T (and hence the XOR is not true).
You can't conclude x: S+T from x:S (because it's false). What you can do is use x to build an S+T. If S=T there are two ways to do that.
From what I've seen, usually subtyping is also defined via embeddings, and if x: T then x: S is false for all S != T by definition.
They see the keyword "struct" and associate it to C# struct which is stack allocated.
Then they see the keyword "record" and think Java record which is syntax sugar for class which is heap allocated.
They never draw the connection that they are both ways to express product types.
There are also connections to muggle math/arithmetic. e.g. for types as for numbers, A^(X+Y)=A^X*A^Y. For types, this says a function that takes a sum works by doing case analysis. You can also "forget" things are types and pretend A, X, Y are the number of possible values the types can have, and your equations about types become true statements about numbers.
The sum type L + N is composed of the values A, B, 1, 2, 3. It's a set union, and the cardinality of the union is the sum of the two cardinalities.
Likewise, the product type L x N has the values (A, 1), (A, 2), (A, 3), (B, 1), (B, 2), and (B, 3). It's the usual cartesian product of sets, and you get the cardinality of that type by multiplying the two cardinalities.
(For bonus points: The type of functions L -> N is an exponential type, with cardinality N^L)
Boolean arithmetic sometimes uses + for OR and * for AND. When considering only zero and non-zero values, the semantics are the same. 1 + 1 + 0 is non-zero, 1 * 1 * 0 is zero. Substitute + for OR and * for AND, and you wind up with the "same" result.
From the linked article:
> one is „and data“, the other is „or data“.
Structs/records/product types fit the description of "and data", because it's a compound value comprised of this type AND that type AND that other type, etc.
Unions/sum types fit the description of "or data" because it's either this type OR that type OR that other type.
But int + int is isomorphic to bool*int.
The unions you're talking about necessarily require a notion of subtyping, I believe.
- sealed classes/interfaces
- discriminated unions
- enums
- case classes
It would be nice if we could agree on a single term to describe this, the same way we standardised on "class" to describe, well, classes.
Did we, when did this happen? Languages like Haskell use type-class to mean something closer to traits/interfaces in other languages. I wouldn't be surprised if other languages have their own definition of what 'class' means.
> to describe, well, classes.
What is a class? Not trying to be an arse here, genuinely, what is it? In OO languages it would be a type with a collection of named-members. How is that different to a record with fields that are either data-types or first-class functions? Is inheritance implied? If so, what's a 'sealed class'?
What does 'class' really mean? Classification? Doesn't the type-name alone classify something? Really, 'class' is the weird one imho; it's just everyone's so used to it now that it seems correct. It's similar with other OO terminology like 'object'. On the surface it seems sensible, but when you really think about it, it's kinda odd.
Ultimately, we all just end up being most comfortable with what we know.
The cardinality of a product type is the product of the cardinalities of its constituents. Likewise a sum type's cardinality is the sum of the cardinalities of its constituents.
Examples:
- a struct with a boolean and an 8-bit integer as its members. A boolean has 2 possible values, the integer has 256 possible values. So the struct has 2x256 = 512 values in total. So a struct is a product type.
- an Optional<int8> is either Some(n: int8) or None. The first one has 256 possible values, the second has one possible value. So the Optional<int8> has 256+1 = 257 values in total. So Optional<...> is a sum type.
Why does this matter? Because when modelling problems it's very nice if you come up with a representation where the number of legal states in the domain exactly matches the number of states of the type. It's awkward and confusing to have possible states of the type that are redundant or don't correspond to anything real. e.g. Golang's error handling uses what is morally a product type like (value, err) := foo(). What does it mean if both value and err are non-nil? That's prohibited only by convention, not the language. The type is "too big". But with sum types we can have Result<value, error>, where it can be a value or an error but not both, guaranteed by the compiler.