Ouch. I think the blame is partly on the build configuration. IMO build configurations shouldn't degrade silently this way. If the user is OK without a 32-bit vDSO then they should explicitly specify that.
o11c 111 days ago [-]
It's worth noting that even on x86, -m32 isn't as complete as a real i?86 build of gcc. It's "complete enough" for the kernel and many other things, but, say, I found it very difficult to build a 32-bit program that didn't rely on SSE since the 64-bit version assumes it.
makomk 111 days ago [-]
In theory you should just be able to set -march to the lowest common denominator kind of CPU you expect your code to run on and it'll avoid relying on SSE if appropriate.
dwattttt 111 days ago [-]
An x86_64 toolchain is allowed to assume SSE2 is present, it's mandated by the AMD64 spec.
WhyNotHugo 111 days ago [-]
What’s the use case for running a 32bit binary in a 64bit cpu/os? Is there any advantage? Or is it simply to avoid having to recompile twice to support two architectures?
nickcw 111 days ago [-]
Apart from backwards compatibility, 32 bit apps can use less memory with half size pointers and ints than 64 bit apps.
Dwedit 111 days ago [-]
I once read an article about a project that placed everything into 64KB size blocks. Any pointer within the block is 16-bit, and you can also have references to other blocks.
First advantage is that the pointers become tiny, so the objects took up a lot less space. The other advantage was that when it was time to serialize and deserialize the data, you didn't have to do any processing at all.
dmitrygr 110 days ago [-]
> placed everything into 64KB size blocks. Any pointer within the block is 16-bit, and you can also have references to other blocks.
Just reinvented segmentation
the_mitsuhiko 111 days ago [-]
That and there are some applications that have algorithms that scale really badly to the much larger size of the address space.
NavinF 110 days ago [-]
That only applies to apps that allocate a ton of objects and refer to them using pointers instead of using a dynamic array with 16bit/32bit indexes
rwmj 111 days ago [-]
Many server-class Aarch64 chips have dropped support for 32 bit so in those cases it's not even an option.
adrian_b 111 days ago [-]
Also some of the latest smartphone chips have a part of the cores that no longer can execute 32-bit code.
The next generations will not include any core supporting the 32-bit ISA.
ithkuil 111 days ago [-]
You can still use 32-bit wide addresses in 64-bit mode and sign-extend them into 64-bit registers.
If you have a statically compiled binary all you need is compiler support for 32-bit address mode in the otherwise 64-bit ready instruction set and on x86 you get access to more registers etc but you don't pay the price for the wider addresses stored in data structures.
OTOH if you use shared libraries you also have to have your shared libraries compiled in that mode too. The difficulty of dealing with that depends also if you gave a hybrid system or a full 32 bit userspace
leni536 111 days ago [-]
Yes, that's an alternate ABI with 32-bit addressing targeting the same 64-bit instruction set. But it's not trivial, as you described. You might also want some kernel support too, and there might not be enough demand to maintain that support.
namibj 109 days ago [-]
AKA x32 on AMD64.
notepad0x90 111 days ago [-]
Under Linux, I have no idea. but aarch has a wide variety of use cases. 64bit means more transistors on cpus, and wider instructions. The people that implement arm's designs would like to save money on all that. if it is an embedded device or a limited use device, it may not need 64bit. Especially when you're mass manufacturing, every $0.01 counts.
Just a side-note, I am always intrigued by 16bit thumb and aarch32 "interworking", basically flipping on/off the LSB of the PC(EIP/RIP equivalent register) to tell the cpu that the next instruction is thumb mode or a32, letting them mix different instruction sets within the same program.
Dalewyn 111 days ago [-]
Backwards compatibility with older binaries (some of which might not have devs anymore), and infrastructure and workflows already in place.
Granted this might not actually be worth considering here, given the much shorter history unlike x86(-64).
t-3 111 days ago [-]
The advantage of 64-bit is only a wider integer range and address space which isn't needed for most applications, so trading those for portability is probably good. I've also heard that many people doing ARM assembly prefer A32 NEON over A64 NEON.
pm215 111 days ago [-]
Depends what you want to be portable to -- now that 64 bit only Arm CPUs are becoming more common, building for 32 bit means you cannot run on those systems.
chasil 111 days ago [-]
For ARM, 64-bit has removed features of the instruction set.
Notibly, conditional execution is gone.
If you have a compiled binary but lack source, then this subsystem is helpful.
0x0 111 days ago [-]
Closed-source applications or games, I'd say.
caerwy 111 days ago [-]
some legacy applications were designed around having a 32 bit wordsize for pointers. so calculations of offsets into memory where a data structure includes pointers depends on a pointer size.
winter_blue 111 days ago [-]
I wonder how vDSO works for an x32 ABI program (ie a program with 32 bit pointers, but access to the rest of the x86-64 feature set).
andirk 111 days ago [-]
Anyone able to give a TL_IDID_R (I did read) but didn't understand a damn thing?
> ensure that CROSS_COMPILE_COMPAT is directed to a 32-bit toolchain. Failure to do so might lead to performance issues.
OK so set up your CONSTs. I agree.
st_goliath 111 days ago [-]
1) System calls need to switch into kernel mode and back, this can be a massive performance hit.
2) This is especially bad if you want to do precise time measurements, a chunk of time is now spent just calling `gettimeofday`. The performance impact (benchmarked and shown in the article) is substantial.
3) Linux (the kernel) at some point added a "vdso", basically a shared library automatically mapped into every process. The libc can call into the vdso version of `gettimeofday` and use the system call as a fallback.
4) A 64 bit kernel that can run 32 bit programs needs a 64 bit and 32 bit vdso.
5) On x86, the kernel build process can "simply" use the same compiler to produce 32 bit code for the 32 bit vdso. GCC for ARM can't, for some reason. You need 2 toolchains, a 64 bit and a 32 bit one.
6) If you don't know that, the kernel kernel only has a 64 bit vdso, the 32 bit program will run but silently use the system call, instead of the vdso, causing unexpected performance issues.
Rendered at 05:28:45 GMT+0000 (Coordinated Universal Time) with Vercel.
First advantage is that the pointers become tiny, so the objects took up a lot less space. The other advantage was that when it was time to serialize and deserialize the data, you didn't have to do any processing at all.
Just reinvented segmentation
The next generations will not include any core supporting the 32-bit ISA.
If you have a statically compiled binary all you need is compiler support for 32-bit address mode in the otherwise 64-bit ready instruction set and on x86 you get access to more registers etc but you don't pay the price for the wider addresses stored in data structures.
OTOH if you use shared libraries you also have to have your shared libraries compiled in that mode too. The difficulty of dealing with that depends also if you gave a hybrid system or a full 32 bit userspace
Just a side-note, I am always intrigued by 16bit thumb and aarch32 "interworking", basically flipping on/off the LSB of the PC(EIP/RIP equivalent register) to tell the cpu that the next instruction is thumb mode or a32, letting them mix different instruction sets within the same program.
Granted this might not actually be worth considering here, given the much shorter history unlike x86(-64).
Notibly, conditional execution is gone.
If you have a compiled binary but lack source, then this subsystem is helpful.
> ensure that CROSS_COMPILE_COMPAT is directed to a 32-bit toolchain. Failure to do so might lead to performance issues.
OK so set up your CONSTs. I agree.
2) This is especially bad if you want to do precise time measurements, a chunk of time is now spent just calling `gettimeofday`. The performance impact (benchmarked and shown in the article) is substantial.
3) Linux (the kernel) at some point added a "vdso", basically a shared library automatically mapped into every process. The libc can call into the vdso version of `gettimeofday` and use the system call as a fallback.
4) A 64 bit kernel that can run 32 bit programs needs a 64 bit and 32 bit vdso.
5) On x86, the kernel build process can "simply" use the same compiler to produce 32 bit code for the 32 bit vdso. GCC for ARM can't, for some reason. You need 2 toolchains, a 64 bit and a 32 bit one.
6) If you don't know that, the kernel kernel only has a 64 bit vdso, the 32 bit program will run but silently use the system call, instead of the vdso, causing unexpected performance issues.