I'd expect negative integer ids in an API to break even more integrations than unexpectedly large integers.
Though I guess that likelyhood is influenced by the choice of protocol. For example when using protobuf the client code generated from the specification file will use a 32-bit integer, if that's how it was defined. While in JSON I'd generally assume it's a positive integer smaller than 2^53.
swiftcoder 2 days ago [-]
You don’t have to expose the negative to the customer - convert it to unsigned at the API layer, and bobs your uncle
OptionOfT 2 days ago [-]
Right. You can have the best documentation:
If they expose them as string and mention they're opaque? Then customers who parse them to uint will get bugs and be unhappy.
Did they expose them as ints? Customers who used uints will be unhappy.
At `jobs[-2]` the front-end parsed the ids (exposed as strings, but ints under the cover).
The backend left them alone.
That caused some issues when building out shared libraries.
Demiurge 2 days ago [-]
What kind of API specifies that your number is int, uint, or bigint? According to a quick search, the formats for APIs are: JSON ~80%, XML ~15%, ~5% other.
lazide 2 days ago [-]
Anyone storing them in a DB, or using them in internal fields will likely have a surprise on their hands. Unless they store them as opaque strings anyway, which is the saner thing to do in these situations anyway.
arjvik 2 days ago [-]
SQL requires setting the max length of a string, and its quite reasonable to set it to len(2147483647)=10 if you were expecting 32-bit int IDs.
lazide 2 days ago [-]
If your goal is storing opaque strings, that is a very silly thing to do.
At that point you’re just blowing up storage for no reason. Just use an int if you’re that sure.
Setting a string length to coincidentally the length of a int serialized to a string while doing no other validation on it is…. Just special.
layer8 2 days ago [-]
If you expose them as strings, you might as well convert them to unsigned at the conversion point.
2 days ago [-]
zeograd 2 days ago [-]
I often see code relying on the increasing property of primary key (keeping track of processed vs unprocessed by the last processed pk only).
This wrap into negative domain would wreck havoc for sure.
CodesInChaos 2 days ago [-]
You generally can't rely on strict monotonicity of primary keys, since the order in which transactions commit isn't necessarily the order in which the ids were generated. But I have relied on primary keys being "monotonic enough" to sort output by creation time for display purposes.
OptionOfT 2 days ago [-]
I've worked on invoicing software where we had to introduce a public, always +1 counter to ensure there are no gaps between invoices. Not +2, not +5.
That way you couldn't make them disappear.
chiph 2 days ago [-]
In the days when you used custom printed forms that had a number printed on them by the printer - when you loaded a new box of paper into your printer you had to input the first form number into the system so they'd match.
If you opened boxes in "whatever" order you'd have invoice numbers that would run contiguous for 150 or so counts (the number of forms in the box), then skip to the next multiple of 150 to correspond to when the next (or previous!) box had been used.
stefs 2 days ago [-]
That mustn't be the primary key, though, but a serial that counts (and is unique) per-customer.
OptionOfT 2 days ago [-]
This was before the SaaS days.
On-prem, single company who issued invoices to customers.
When there was an audit the government could ask to see invoices in a certain range. If some of them were missing, what does that mean? Paid under the table?
My wife worked at a place where they did manual PDFs, but there they had a tool to change properties of a PDF to change the creation time / last editing time, for when 'modifications' were needed.
And this reminds me of the other post here where some people assume cash means shady. Definitely the case there.
veyh 2 days ago [-]
Well, I'd imagine that before returning the value through their API they could just check that if the number is negative, then add 2^32 to it, which would make it look like an unsigned 32 bit integer.
conradfr 2 days ago [-]
But isn't that exactly what they were trying to not do as their problem was the api users and not their internal use?
veyh 2 days ago [-]
It was definitely a problem with their database but I suppose it's possible that the customers were also expecting 32 bit signed ints.
layer8 2 days ago [-]
In most languages that support differently sized integer types and/or unsigned integer types, you wouldn’t have to check, but can just apply the appropriate modulo or bit operation on all values.
TheCowboy 2 days ago [-]
> No-one really likes engineering war stories
Is that really true? I did keep reading the entire piece. I think they're often interesting and can contain nuggets of wisdom or insight. Or sometimes they're just funny. When I meet someone who worked on something interesting, I often start trying to pry stories like this post out of them.
shermantanktop 2 days ago [-]
Everyone likes engineering war stories!!! Never heard of an engineer who didn’t.
yk 2 days ago [-]
No, but it is amazing first sentence. Everybody goes, this story is specifically for me, I'm very special.
eCa 2 days ago [-]
I read the piece (and enjoyed it) despite the first sentence. I’ve become increasingly sensitive to this kind of fluff.
It’s not a hook, it’s bad read-bait.
tclancy 2 days ago [-]
Well then you are very special.
Introverts hate this one weird trick!
bobthebuilders 2 days ago [-]
Half the time I read the stories they're just a thinly disguised ad for some flavor the day SaaS, so at least in this instance the hook was somewhat useful. Now if everyone uses this to shill their SaaS, then maybe not.
derekcheng08 2 days ago [-]
LOL came here to say this exactly. Everyone LOVES war stories in my experience :)
2 days ago [-]
Demiurge 2 days ago [-]
I don't understand, what was the issue with changing the column type from `int` to `bigint`? What does exposing the IDs have to do with how large those ints can be? This seems like a backend issue, if we're talking about HTTP/REST APIs. Now, if we're talking compiled C style APIs, then yes, obviously widening the types will cause issues. This is very important context that is missing from this article.
icedchai 2 days ago [-]
The issue was probably database migration time. I was once at a startup that had close over 1 billion+ rows in MySQL. We were approaching the `int` limit in another year or so. Many tables would need to be migrated due to foreign key constraints. Migrating one of the tables required significant downtime (6 to 8 hours, IIRC) due to slow spinning disks. Some servers didn't have enough space to rebuild the tables, so we'd want to add disks just in case. There were several servers.
A few "alter table" commands cascades to an operational PITA.
yawnr 2 days ago [-]
I guess if in the API documentation you are saying the pkey is an int, then someone consuming that data and storing it in their own table would also likely make that the column type. So when it crosses that threshold, your customers’ tables will break.
I think he did a pretty bad job of explaining it if that’s the case though.
2 days ago [-]
nikanj 2 days ago [-]
I wonder how many API users needed the attribute to be an integer (instead of just treating it as an opaque handle string), but didn't mind the integer turning negative
cesaref 2 days ago [-]
I think the point is that the API doesn't specify that the returned integers are positive, or are monotonically increasing, then it's fine for the service to return any unique integer.
If a client application makes an assumption about this, then their engineers will accept this as being their bad and will fix it.
I'd defend this as being pragmatic - minimising disruption to clients instead of the more 'correct' solution of changing the API. I'm hoping that they managed to roll out the new API update alongside the old one and avoid a 'big bang' API change with this. Sometimes this isn't possible, but it's great when that works out.
CodesInChaos 2 days ago [-]
I'm far more likely to assume that an integer-id I get from an API is non-negative or even positive than to assume that they're always smaller than 2^31. And I'd be far more likely to blame the API provider for violating the former assumption.
NetMageSCW 2 days ago [-]
That sounds like a you problem.
Kwpolska 2 days ago [-]
Probably none needed it to be an integer. At the same time, if the API contract says {id: integer, name: string}, then you are likely to have developers, especially in statically-typed languages, that will create a class with an int32 field, and tell the JSON parsing library to create instances of that class when deserializing the API response.
btown 22 hours ago [-]
Negative integers seem like a nightmare if somewhere downstream someone has a route like /widgets/(\d+), or if someone is scanning text outputs for IDs with a similar regex. The BigInt expansion seems far less risky IMO.
sorrythanks 2 days ago [-]
maybe i'm too far gone, but this doesn't even feel hacky to me. the key needs to be a unique number, -1 and 1 are two different numbers.
slipperybeluga 2 days ago [-]
Yeah but how many of those customers were relying on the key not being a negative number?
NetMageSCW 2 days ago [-]
Assuming the API was properly documented as returning signed int, that’s not my problem. Abuse of the API or
misunderstanding of the API doesn’t trump running out of space.
drob518 2 days ago [-]
Exactly. I mean, if the end solution is to convert to a big int, who’s to say that some customer didn’t assume it would always be 32 bits and blow up then, too.
This does highlight the fact that 32 bit is just a small number these days. Personally, I prefer UUIDs instead of incrementing integers for primary keys since they also scale out without having to have global coordination, but at least choose a 64-bit number.
willtemperley 1 days ago [-]
Yes. It's just so much easier to create a UUID client-side, use that to identify data in temporary UI state and commit without having to worry about getting the incremented identifier.
I find this significantly reduces decision fatigue. Deciding which hack to use for temporary identifiers is not much fun.
dusted 2 days ago [-]
> No-one really likes engineering war stories,
I love engineering war stories
thewisenerd 2 days ago [-]
can't wait for solutions of a similar nature around 2038-01-19
a free 68 more years!
(hopefully nobody optimized for the 1 signed bit when allocating memory tho)
The file format is obsolete (it assumes a fixed number of terminal lines per system) and has unfixable locking issues, so it has to be replaced anyway.
1egg0myegg0 2 days ago [-]
Whoever gets that magical -2,147,483,648 is going to be really surprised that things keep working
IshKebab 2 days ago [-]
Hard to believe that all their customers had written their code to work with signed IDs though.
Honestly I would expect that to break more users code (and in weirder ways) than just changing the type. It's unclear from the story how the type was exposed though.
gcanyon 2 days ago [-]
Yeah, this was my immediate thought as well, but if the spec for the API says signed int, then at least you're defensible: you haven't broken the letter of the spec, even if you're pounding on the spirit of the spec pretty hard. You have a fairly reasonable likelihood that most/all of your customers have implemented to your spec, and therefore any negative consequences are down to secondary effects of how they handle the negative values, not directly because of failure to be able to store them.
That said, to your point, there was almost certainly someone comparing IDs to determine recency, and during the transition from large-positive to large-negative, that would absolutely cause havoc.
I'd be curious if their API spec actually said anywhere that the IDs increased consistently.
crazygringo 2 days ago [-]
Came here to say exactly this. Programming languages usually default to signed, but if you're storing these things in databases it's common to explicitly choose unsigned, since ID's are virtually always unsigned and it gives you twice the space until you run out.
Like, instead of using negative primary keys, they could have also just have converted to an unsigned int32. I would assume both of those would break a bunch of customer implementations though.
sgarland 2 days ago [-]
Postgres doesn’t have unsigned column types out of the box. There’s an extension that enables it, but you’d have to know about that (which you should, if you’re managing a DB, but I digress).
MySQL does have unsigned ints out of the box, FWIW.
NetMageSCW 2 days ago [-]
One of them would presumably break every customer if the API was properly documented.
hn92726819 2 days ago [-]
I'd believe it. Not sure when this is, but if it's a few years old and business software, they could probably asume everyone uses java, which doesn't even have unsigned integers.
swsieber 2 days ago [-]
With $MY_JOB in java, that was my assumption
IshKebab 2 days ago [-]
Right but just because it's `int id` doesn't mean all code that uses it will still work when it's negative.
hn92726819 19 hours ago [-]
True, but it does seem like the best alternative here. If it's a SOAP API in 2005 for business customers, for example, then it sounds like the least bad option of the four (tell consumers to update, hold up the whole company's deployment, push negative ints, or push longs). I'm just saying that to me, it isn't hard to believe this was the best option here.
sheepscreek 2 days ago [-]
This is engineering at its finest. Working within tight constraints to find solutions that minimize impact. An equally important part of the “solution” is communication - to the leadership, departments and customers. Start early, communicate often and you will almost always come out ahead, even if mistakes are made.
valicord 2 days ago [-]
I don't get it. How would switching to bigint break the existing integrations?
Kwpolska 2 days ago [-]
If the existing code was using int32, a switch to anything larger would cause integer overflows or JSON parsing errors in languages with strongly-typed fixed-width integer types.
masklinn 2 days ago [-]
Any call from a typed langage distinguishing between 32b and 64b integers (that being most popular typed languages I reckon) would break if it had assumed / used the smaller of the two.
TBF using the negative range could also break callers distinguishing between signed and unsigned if they’d used the latter on their side depending how the API was documented.
VladVladikoff 2 days ago [-]
I wonder how many Unix timestamps are going to wrap around to negative in 2032?
OptionOfT 2 days ago [-]
None!
But 2038 is gonna be awesome!
drob518 2 days ago [-]
I wonder how many airplanes are going to fall out of the sky? Or maybe we have to wait until January 1, 10000, for that.
6stringmerc 2 days ago [-]
I applaud your enthusiasm for life in the wasteland post US collapse / Balkanization and will meet you at the last functioning terminal in 2042!
2 days ago [-]
estimator7292 2 days ago [-]
As my last job was winding down (much to the disbelief and utter denial of the CEO) we'd ran out of money for Unity licenses and ran out of staff to use Unity. CEO decided that we absolutely must have a Unity demo that worked with the slightly newer generation of hardware I was wrapping up. Being the only programmer left, it was of course my problem to figure out. Oh and also this has to be ready for a show next week, so chop-chop.
I ended up decompiling some android APKs our last Unity dev had built like eight months prior. I figured out how to extract our device driver library, then painstakingly rewrote the entire library to support new hardware while also maintaining a compatible ABI and stuffed it all back into the APK. I think I also had to forge some keys or something? It was a fucking mess. Anyway, that was the last work I ever did for him because he didn't pay me for about two months after that, and I quit the moment he gave me the wages he owed me.
He's only got one employee and zero customers, but hey his stupid demo worked for all that mattered.
zmj 2 days ago [-]
If you're not doing math with it, it's a string.
hobs 2 days ago [-]
I would say there are times that doing math with a primary key is a useful property (say, getting the Nth primary key (or so)) but if you are exposing it in an API I would say you would never even want a primary key projected in the first place.
A primary key is almost an implementation detail - a key that an API knows about something is one of many things that might point to this thing, might need to change, and generally might need a different representation (so don't make it your primary key.)
I also tell people to just use the bottom of any primary key space (when choosing monotonic stuff) but so many engineers just complain that they dont like the numbers (and yet many of them have had to deal with the migration a few years later so ... enjoy that I guess.)
faxmeyourcode 2 days ago [-]
> No-one really likes engineering war stories
This is so wrong. I love reading these kinds of stories
Rendered at 10:23:14 GMT+0000 (Coordinated Universal Time) with Vercel.
Though I guess that likelyhood is influenced by the choice of protocol. For example when using protobuf the client code generated from the specification file will use a 32-bit integer, if that's how it was defined. While in JSON I'd generally assume it's a positive integer smaller than 2^53.
If they expose them as string and mention they're opaque? Then customers who parse them to uint will get bugs and be unhappy.
Did they expose them as ints? Customers who used uints will be unhappy.
At `jobs[-2]` the front-end parsed the ids (exposed as strings, but ints under the cover).
The backend left them alone.
That caused some issues when building out shared libraries.
At that point you’re just blowing up storage for no reason. Just use an int if you’re that sure.
Setting a string length to coincidentally the length of a int serialized to a string while doing no other validation on it is…. Just special.
This wrap into negative domain would wreck havoc for sure.
That way you couldn't make them disappear.
If you opened boxes in "whatever" order you'd have invoice numbers that would run contiguous for 150 or so counts (the number of forms in the box), then skip to the next multiple of 150 to correspond to when the next (or previous!) box had been used.
On-prem, single company who issued invoices to customers.
When there was an audit the government could ask to see invoices in a certain range. If some of them were missing, what does that mean? Paid under the table?
My wife worked at a place where they did manual PDFs, but there they had a tool to change properties of a PDF to change the creation time / last editing time, for when 'modifications' were needed.
And this reminds me of the other post here where some people assume cash means shady. Definitely the case there.
Is that really true? I did keep reading the entire piece. I think they're often interesting and can contain nuggets of wisdom or insight. Or sometimes they're just funny. When I meet someone who worked on something interesting, I often start trying to pry stories like this post out of them.
It’s not a hook, it’s bad read-bait.
Introverts hate this one weird trick!
A few "alter table" commands cascades to an operational PITA.
I think he did a pretty bad job of explaining it if that’s the case though.
If a client application makes an assumption about this, then their engineers will accept this as being their bad and will fix it.
I'd defend this as being pragmatic - minimising disruption to clients instead of the more 'correct' solution of changing the API. I'm hoping that they managed to roll out the new API update alongside the old one and avoid a 'big bang' API change with this. Sometimes this isn't possible, but it's great when that works out.
This does highlight the fact that 32 bit is just a small number these days. Personally, I prefer UUIDs instead of incrementing integers for primary keys since they also scale out without having to have global coordination, but at least choose a 64-bit number.
I find this significantly reduces decision fatigue. Deciding which hack to use for temporary identifiers is not much fun.
I love engineering war stories
a free 68 more years!
(hopefully nobody optimized for the 1 signed bit when allocating memory tho)
The file format is obsolete (it assumes a fixed number of terminal lines per system) and has unfixable locking issues, so it has to be replaced anyway.
Honestly I would expect that to break more users code (and in weirder ways) than just changing the type. It's unclear from the story how the type was exposed though.
That said, to your point, there was almost certainly someone comparing IDs to determine recency, and during the transition from large-positive to large-negative, that would absolutely cause havoc.
I'd be curious if their API spec actually said anywhere that the IDs increased consistently.
Like, instead of using negative primary keys, they could have also just have converted to an unsigned int32. I would assume both of those would break a bunch of customer implementations though.
MySQL does have unsigned ints out of the box, FWIW.
TBF using the negative range could also break callers distinguishing between signed and unsigned if they’d used the latter on their side depending how the API was documented.
But 2038 is gonna be awesome!
I ended up decompiling some android APKs our last Unity dev had built like eight months prior. I figured out how to extract our device driver library, then painstakingly rewrote the entire library to support new hardware while also maintaining a compatible ABI and stuffed it all back into the APK. I think I also had to forge some keys or something? It was a fucking mess. Anyway, that was the last work I ever did for him because he didn't pay me for about two months after that, and I quit the moment he gave me the wages he owed me.
He's only got one employee and zero customers, but hey his stupid demo worked for all that mattered.
A primary key is almost an implementation detail - a key that an API knows about something is one of many things that might point to this thing, might need to change, and generally might need a different representation (so don't make it your primary key.)
I also tell people to just use the bottom of any primary key space (when choosing monotonic stuff) but so many engineers just complain that they dont like the numbers (and yet many of them have had to deal with the migration a few years later so ... enjoy that I guess.)
This is so wrong. I love reading these kinds of stories