NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Jank is C++ (jank-lang.org)
johnnyjeans 23 hours ago [-]
I'm not surprised to see that Jank's solution to this is to embed LLVM into their runtime. I really wish there was a better way to do this.

There are a lot of things I don't like about C++, and close to the top of the list is the lack of standardization for name-mangling, or even a way mangle or de-mangle names at compile-time. Sepples is a royal pain in the ass to target for a dynamic FFI because of that. It would be really nice to have some way to get symbol names and calling semantics as constexpr const char* and not have to deal with generating (or writing) a ton of boilerplate and extern "C" blocks.

It's absolutely possible, but it's not low-hanging fruit so the standards committee will never put it in. Just like they'll never add a standardized equivalent for alloca/VLAs. We're not allowed to have basic, useful things. Only more ways to abuse type deduction. Will C++26 finally give us constexpr dynamic allocations? Will compilers ever actually implement one of the three (3) compile-time reflection standards? Stay tuned to find out!

benreesman 23 hours ago [-]
Carmack did very much almost exactly the same with the Trinity / Quake3 Engine: IIRC it was LCC, maybe tcc, one of the C compilers you can actually understand totally as an individual.

He compiled C with some builtins for syscalls, and then translated that to his own stack machine. But, he also had a target for native DLLs, so same safe syscall interface, but they can segv so you have to trust them.

Crazy to think that in one computer program (that still reads better than high-concept FAANG C++ from elite lehends, truly unique) this wasn't even the most dramatic innovation. It was the third* most dramatic revolution in one program.

If you're into this stuff, call in sick and read the plan files all day. Gives me googebumps.

no_wizard 21 hours ago [-]
Carmack actually deserves the moniker of 10x engineer. Truly his work in his domain has reached far outside it because id the quality of his ideas and methodologies
bitwize 17 hours ago [-]
I have a bit I do where I do Carmack's voice in a fictional interview that goes something like this:

Lex Fridman: So of all the code you've written, is there any that you particularly like?

Carmack: I think the vertex groodlizer from Quake is probably the code I'm most proud of. See, it turns out that the Pentium takes a few cycles too long to render each frame and fails to hit its timing window unless the vertices are packed in canonically groodlized format. So I took a weekend, 16-hour days, and just read the relevant papers and implemented it in code over that weekend, and it basically saved the whole game.

The point being that not only is he a genius, but he also has an insane grindset that allows him to talk about doing something incredibly arcane and complex over a weekend -- devoting all his time to it -- the way you and I talk about breakfast.

upghost 15 hours ago [-]
Another weird thing about Carmack, now that you mention it -- and Romero, coincidentally -- is their remarkable ability to remember technical challenges they've solved over time.

For whatever reason the second I've solved a problem or fixed a bug, it basically autopurges from my memory when I start on the next thing.

I couldn't tell you the bugs I fixed this morning, let along the "groodilizer" I optimized 20 years ago.

Oh btw Jank is awesome and Jaeye is great guy, and also a game industry dev!

whstl 8 hours ago [-]
The trick for remembering those things is debriefing with other devs and then documenting. And then keep talking about it.

I don't do mind blowing stuff like Carmack but: just yesterday I came across a bug that helps supporting my thesis that "splitting methods by LOC" can cause subtle programmer mistakes. Wanna write a blog post about it asap.

pankajdoharey 5 minutes ago [-]
Yup this is the way.
benreesman 13 hours ago [-]
I find it's actually a good guideline for what to work on. If I'm 1, 3, 6, 12 months into a job or some other project and I can't remember what I was doing X months ago it tends to mean that I'm not improving during that period of time either.

Carmack is always trying to get better, do more, push the envelope further. He was never in it for money or fame, he was in it to be the best near as I can tell. And he's still among the truly terrifying hackers you wouldn't want to be up against, he just never stopped. You get that with a lot of the people I admire and try to emulate as best I can, Thompson comes to mind, Lamport, bunch of people. They just keep getting more badass from meeting their passion to the grave, a lifelong project of unbounded commitment to excellence in their craft.

That's who I look up to.

mjevans 13 hours ago [-]
I tend to (more easily) remember things that frustrate me but I overcome. Annoyance is a real factor in it.
ajkjk 16 hours ago [-]
I like this word, 'grindset'
MangoToupe 19 hours ago [-]
Linking directly to C++ is truly hell just considering symbol mangling. The syntax <-> semantics relationship is ghastly. I haven't seen a single project tackle the C++ interface in its entirety (outside of clang). It nearly seems impossible.

There's a reason Carmack tackled the C abi and not whatever the C++ equivalent is.

PaulDavisThe1st 19 hours ago [-]
There is no C ABI (windows compilers do things quite differently from linux ones, etc) and there is no certainly no C++ equivalent.
caim 18 hours ago [-]
C ABI is the system V abi for Unix, since C was literally created for it. And that is the abi followed by pretty much any Unix successor: Linux, Apple's OS, FreeBSD.

Windows has its own ABI.

The different abi is pretty much legacy and the fact that x86_64 ABI was built by AMD + Linux etc, while Microsoft worked with Intel for the Itanium abi.

Someone 4 hours ago [-]
> And that is the abi followed by pretty much any Unix successor: Linux, Apple's OS, FreeBSD.

Even limiting that to “on x64”, I don’t see how that’s true. To make a syscall, the ABI on Linux says “make the call”, while MacOS (and all the BSDs, I think) says “call the provided library function”.

Also (https://developer.apple.com/documentation/xcode/writing-64-b...): “Apple platforms typically follow the data representation and procedure call rules in the standard System V psABI for AMD64, using the LP64 programming model. However, when those rules are in conflict with the longstanding behavior of the Apple LLVM compiler (Clang) on Apple platforms, then the ABI typically diverges from the standard Processor Specific Application Binary Interface (psABI) and instead follows longstanding behavior”

Some of the exceptions mentioned there are:

- Asynchronous Swift functions receive the address of their async frame in r14. r14 is no longer a callee-saved register for such calls.

- Integer arguments that are smaller than int are required to be promoted to int by the caller, and the callee may assume that this has been done. (This includes enumerations whose underlying type is smaller than int.) For example, if the caller passes a signed short argument in a register, the low 32 bits of the register at the moment of call must represent a value between -32,768 and 32,767 (inclusive). Similar, if the caller passes an unsigned char argument in a register, the low 32 bits of the register at the moment of call must represent a value between 0 and 255 (inclusive). This rule also applies to return values and arguments passed on the stack.

caim 53 minutes ago [-]
Swift has its own ABI and calling convention, so that makes sense that Apple adapted to it.

The system v abi doesn't say anything about syscall.

Windows x86_64 abi is the same abi for x86, for this reason, you can only pass arguments in 4 registers ( while unix uses 6 ) because x86 only had 8 registers.

I think people have expectations that are misaligned with history and reality about this, to be honest. We can't expect all OS to do things in the same way.

C was created to rewrite the UNIX system, and POSIX compliance is followed by all successors, with minimal differences.

When it became clear that "Itanium" was a failure, Microsoft couldn't just pull an ABI out of the box and break all applications, so they just reused the same x86 ABI.

cryptonector 14 hours ago [-]
The C ABI is basically per-platform (+ variations, like 32- vs 64-bit). But you can get by quite well pretending there is something like a C ABI if you use <stdint.h>.
MangoToupe 19 hours ago [-]
[flagged]
MangoToupe 10 hours ago [-]
I just want to double-down on the "fuck windows" bit. No other operating system has been more hostile to developers than microsoft has been. Fuck those assholes and may they burn in hell. I'll never lift a finger to give a shit about windows or any microsoft effort, just like they never devoted any effort to caring about anyone but shareholders and clients who also hate humanity.

I'll help their users insofar as "help" means "shedding windows at any cost".

caim 18 hours ago [-]
Just parsing C++ is already a freaking hell.

It's no wonder that every other day a new mini C compiler drops in, while no one even attempts to parse C++.

fuhsnn 17 hours ago [-]
There is one pretty serious C++ parser project: https://github.com/robertoraggi/cplusplus
caim 17 hours ago [-]
Wow, thanks! I didn't know this project.

To parse C++ you need to perform typecheck and name resolution at the same time. And C++ is pretty complex so it's not a easy task.

johnnyjeans 22 hours ago [-]
Any particular year?
wging 20 hours ago [-]
Quake III Arena was released in 1999. It was open-sourced in 2005.

https://github.com/id-Software/Quake-III-Arena

https://en.wikipedia.org/wiki/Id_Tech_3

(from the source release you can see benreesman remembered right: it was lcc)

johnisgood 1 hours ago [-]
Id Tech 3 was great, gave us a lot of forks.
Jeaye 22 hours ago [-]
I hear you when it comes to C++ portability, ABI, and standards. I'm not sure what you would imagine jank using if not for LLVM, though.

Clojure uses the JVM, jank uses LLVM. I imagine we'd need _something_ to handle the JIT runtime, as well as jank's compiler back-end (for IR optimization and target codegen). If it's not LLVM, jank would embed something else.

Having to build both of these things myself would make an already gargantuan project insurmountable.

o11c 23 hours ago [-]
> the lack of standardization for name-mangling, or even a way mangle or de-mangle names at compile-time.

Like many things, this isn't a C++ problem. There is a standard and almost every target uses it ... and then there's what Microsoft does. Only if you have to deal with the latter is there a problem.

Now, standards do evolve, and this does give room for different system libraries/tools to have a different view of what is acceptable/correct (I still have nightmares of trying to work through `I...E` vs `J...E` errors) ... but all the functionality does exist and work well if you aren't on the bleeding edge (fortunately, C++11 provided the bits that are truly essential; everything since has been merely nice-to-have).

mort96 22 hours ago [-]
Like many things people claim "isn't a C++ problem but an implementation problem"... This is a C++ problem. Anything that's not nailed down by the standard should be expected to vary between implementations.

The fact that the standard doesn't specify a name mangling scheme leads to the completely predictable result that different implementations use different name mangling schemes.

The fact that the standard doesn't specify a mechanism to mangle and demangle names (be it at runtime or at compile time) leads to the completely predictable result that different implementations provide different mechanisms to mangle and demangle names, and that some implementations don't provide such a mechanism.

These issues could, and should, have been fixed in the only place they can be fixed -- the standard. ISO is the mechanism through which different implementation vendors collaborate and find common solutions to problems.

josefx 19 hours ago [-]
> The fact that the standard doesn't specify a name mangling scheme leads to the completely predictable result that different implementations use different name mangling schemes.

The ABI mess predates the standard by years and if we look that far back the Annotated C++ Reference Manual included a scheme in its description of the language. Many compiler writers back then made the intentional choice to ignore it. The modern day ISO standard would not fare any better at pushing that onto unwilling compiler writers than it fared with the c++03 export feature.

mort96 19 hours ago [-]
Well yeah, it can't be fixed now. It could have been specified near the beginning of the life of the language though, and the standard would've been the right place to do that.

I'm saying that the non-standard name mangling is a problem with C++. I'm not saying that it's an easily solvable problem.

vlovich123 21 hours ago [-]
> Anything that's not nailed down by the standard should be expected to vary between implementations.

When you have one implementations you have a standard. When you have two implementations and a standard you don’t actually have a standard in practice. You just have two implementations that kind of work similarly in most cases.

While the major compilers do a fantastic job they still frequently disagree about even “well defined” behavior because the standard was interpreted differently or different decisions were made.

SideQuark 21 hours ago [-]
> When you have two implementations and a standard you don’t actually have a standard in practice

This simply isn't true. Plenty of standardized things are interchangeable, from internet RFCs followed by zillions of players and implementations of various RFCs, medical device standards, encryption standards, weights and measures, currency codes, country codes, time zones, date and time formats, tons of file formats, compression standards, the ISO 9000 series, ASCII, testing standards, and on and on.

The poster above you is absolutely correct - if something is not in the standard, it can vary.

vlovich123 18 hours ago [-]
Lots of things have standards that are mostly interchangeable. But pretending like the implementations don't interpret standards differently and have important differences is naiive. This is doubly so for language implementations which frequently leave things as "implementation defined". Clang and GCC have intentionally made significant efforts to minimize their differences which is why it's less noticeable if you're just swapping between them (it didn't start out this way). MSVC has not made these efforts. Intel abandoned their compiler for Clang. So basically you already have MSVC & clang/GCC as dialects, ignoring more minor differences that readily exist between clang and GCC.

Compare this with languages like Zig, Rust, and Python that have 1 compiler and doesn't have any of the problems of C++ in terms of interop and not having dialects.

Java is the closest to C++ here but even there it's 1 reference implementation (OpenJDK that Oracle derives their release from) and a bunch of smaller implementations that everyone derives from. Java is aided here by the fact that the JDK code itself is shared between JVMs, the language itself is a very thin translation of code --> byte code, and the language is largely unchanging. JavaScript is also in a similar boat but they're aided by the same thing as Java - the language is super thin and has almost nothing in it with everything else deferred as browser APIs where there is this dialect problem despite the existence of standards.

suprtx 8 hours ago [-]
HN is a censorship haven and all, but I'd like to point out just one thing:

>Compare this with languages like Zig, Rust, and Python that have 1 compiler and doesn't have any of the problems of C++ in terms of interop and not having dialects.

For Python, this is straight up just wrong.

Major implementations: CPython, PyPy, Stackless Python, MicroPython, CircuitPython, IronPython, Jython.

jdiff 19 hours ago [-]
If you have an area where the standard is ambiguous about its requirements, then you have a bug in the standard. And hopefully you also have a report to send along to help it communicate itself more clearly.
o11c 21 hours ago [-]
This is like getting mad at ISO 8601 because it doesn't define the metric system.

No standard stands alone in its own universe; complementary standards must necessarily always exist.

Besides, even if the C++ standard suddenly did incorporate ABI standards by reference, Microsoft would just refuse to follow them, and nothing would actually be improved.

no_wizard 21 hours ago [-]
A better situation than today would be only having to deal with Microsoft and Not Microsoft, rather than multiple different ways of handling the problem that can differ unexpectedly
rs186 16 hours ago [-]
> There is a standard and almost every target uses it ... and then there's what Microsoft does. Only if you have to deal with the latter is there a problem.

Sounds like there isn't a standard, then.

duped 10 hours ago [-]
It's Itanium and MSVC. That's it, that's the list.
Someone 3 hours ago [-]
I would think name mangling is out of scope for a programming language definition, more so for C and C++, which target running on anything under the sun, including systems that do not have libraries, do not have the concept of shared libraries or do not have access to function names at runtime.

> It would be really nice to have some way to get symbol names and calling semantics

Again, I think that’s out of scope for a programming language. Also, is it even possible to have a way to describe low level calling semantics for any CPU in a way such that a program can use that info? The target CPU may not have registers or may not have a stack, may have multiple types of memory, may have segmented memory, etc.

plq 22 hours ago [-]
> the lack of standardization for name-mangling

I don't see the point of standardizing name mangling. Imagine there is a standard, now you need to standardize the memory layout of every single class found in the standard library. Without that, instead of failing at link-time, your hypothetical program would break in ugly ways while running because eg two functions that invoke one other have differing opinions about where exactly the length of a std::string can be found in the memory.

johnnyjeans 21 hours ago [-]
The naive way wouldn't be any different than what it's like to dynamically load sepples binaries right now.

The real way, and the way befitting the role of the standards committee is actually putting effort into standardizing a way to talk to and understand the interfaces and structure of a C++ binary at load-time. That's exactly what linking is for. It should be the responsibility of the software using the FFI to move it's own code around and adjust it to conform with information provided by the main program as part of the dynamic linking/loading process... which is already what it's doing. You can mitigate a lot of the edge cases by making interaction outside of this standard interface as undefined behavior.

The canonical way to do your example is to get the address of std::string::length() and ask how to appropriately call it (to pass "this, for example.)

duped 10 hours ago [-]
This standard already exists, it's called the ABI and the reason the STL can't evolve past 90s standards in data structures is because breaking it would cause immeasurable (read: quite measurable) harm

Like, for fuck's sake, we're using red/black trees for hash maps, in std - just because thou shalt not break thy ABI

int_19h 8 hours ago [-]
We're using self-balancing trees for std::map because the specification for std::map effectively demands that given all the requirements (ordering, iterator and pointer stability, algorithmic complexity of various operations, and the basic fact that std::map has to implement everything in terms of std::less - it's emphatically not a hash map). It has nothing to do with ABI.

Are you rather thinking of std::unordered_map? That's the hash map of standard C++, and it's the one where people (rightfully) complain that it's woefully out of date compared to SOTA hashmap implementations. But even there an ABI break wouldn't be enough, because, again, the API guarantees in the Standard (specifically, pointer stability) prevent a truly efficient implementation.

mkl 6 hours ago [-]
Are there open source libraries that provide a better hash map? I have an application which I've optimized by implementing a key data structure a bunch of ways, and found boost::unordered_map to be slightly faster than std::unordered_map (which is faster than std::map and some other things), but I'd love something faster. All I need to store are ~1e6 things like std::array<int8_t, 20>.
kazinator 9 hours ago [-]
> embed LLVM into their runtime

That comically reads like "embed a blue whale into your hammock".

almostgotcaught 23 hours ago [-]
> LLVM into their runtime

they're not embedding LLVM - they're embedding clang. if you look at my comment below, you'll see LLVM is not currently sufficient.

> [C++] is a royal pain in the ass to target for a dynamic FFI because of that

name mangling is by the easiest part of cpp FFI - the hard part is the rest of the ABI. anyone curious can start here

https://github.com/rust-lang/rust-bindgen/issues/778

Jeaye 23 hours ago [-]
To be fair, jank embeds both Clang and LLVM. We use Clang for C++ interop and JIT C++ compilation. We use LLVM for IR generation and jank's compiler back-end.
johnnyjeans 23 hours ago [-]
> they're not embedding LLVM - they're embedding clang

They're embedding both, according to the article. But it's also just sloppy semantics on my part; when I say LLVM, I don't make a distinction of the frontend or any other part of it. I'm fully relying on context to include all relevant bits of software being used. In the same way I might use "Windows" to refer to any part of the Windows operating system like dwm.exe, explorer.exe, command.com, ps.exe, etc. LLVM a generic catch-all for me, I don't say "LLI" I say "the LLVM VM", for example. I can't really consider clang to be distinct from that ecosystem, though I know it's a discrete piece of software.

> name mangling is by the easiest part of cpp FFI

And it still requires a lot of work, and increases in effort when you have multiple compilers, and if you're on a tiny code team that's already understaffed, it's not really something you can worry about.

https://en.m.wikiversity.org/wiki/Visual_C%2B%2B_name_mangli...

You're right, writing platform specific code to handle this is more than possible. But it takes manhours that might just be better spent elsewhere. And that's before we get to the part where embedding a C++ compiler is extremely inappropriate when you just want a symbol name and an ABI.

But this is besides the point: The fact that it's not a problem solved by the gargantuan standard is awful. I also consider the ABI to be the exact same issue, that being absolutely awful support of runtime code loading, linking and interoperation. There's also no real reason for it, other than the standards committee being incompetent.

kccqzy 22 hours ago [-]
> de-mangle names at compile-time

Far from being standardized but it's possible today on GCC and Clang. You just abuse __PRETTY_FUNCTION__.

dataflow 20 hours ago [-]
That's not demangling a mangled name, it's retrieving the unmangled name of a symbol.
kmeisthax 11 hours ago [-]
[dead]
Mathnerd314 23 hours ago [-]
Ok so jank is Clojure but with C++/LLVM runtime rather than JVM. So already all of its types are C++ types, that presumably makes things a lot easier. Basically it just uses libclang / CppInterOp to get the corresponding LLVM types and then emits a function call. https://github.com/jank-lang/jank/blob/interop/compiler%2Bru...
papichulo2023 23 hours ago [-]
Recently I tried D lang and was surprise with the nice interop with C++ (the language in general feels pretty good), Carbon is nowhere to be seen and havent tried Swift's yet. I hope this is a good one.
Imustaskforhelp 22 hours ago [-]
Now ofc jank already has gotten C++ support but if I may ask, if it had, let's say gotten D lang support, then would that have been easier/more doable/practical?
actionfromafar 23 hours ago [-]
Shedskin lang has excellent integration with C++.
almostgotcaught 23 hours ago [-]
shedskin isn't actively developed ... or at least it wasn't for like 10 years https://github.com/shedskin/shedskin/graphs/contributors
actionfromafar 19 hours ago [-]
It suddenly sprung to life and gained Python 3 compatilibity, which makes it much more interesting than before despite it's ten year hiatus in Python 2 land.
dmoy 23 hours ago [-]
jekwoooooe 27 minutes ago [-]
Clojure syntax (or clojure like, whatever this is) is easily the worst I’ve ever seen. (How [do you guys] (live) like this) it’s just awful. One could say it’s jack
caim 18 hours ago [-]
That's great! Interop with C++ is such a complex task. Congratss on your work! It's definitely not an easy thing.

I've always wondered what is the best way to interact with C++ template instantiation while keeping performance.

For a static language, you'd probably need to translate your types to C++ during compilation, ask Clang/GCC/MSVC to compile the generated C++ file, and then link the final result.

And finally, pray to the computer gods that name mangiling was done right.

YuriNiyazov 19 hours ago [-]
A long long time ago, at ClojureConj 2014, I asked Rich Hickey whether a cpp-based clojure was possible, and his answer was "well, the primary impediment there is a lack of a garbage collector". There were a lot of conversations going on at the same time, so I didn't get an opportunity to "delve" into it, but:

1. does that objection make sense? 2. How does jank approach that hurdle.

ethan_smith 2 hours ago [-]
Jank likely uses a combination of LLVM's garbage collection support (GC intrinsics) and smart pointers, similar to how Clasp implemented GC for Common Lisp on C++.
Jeaye 18 hours ago [-]
A GC is nowhere near the most difficult part of this. In 2014, there was no viable technology for JIT compiling C++, and very little technology for JIT compiling native code in general.
bertmuthalaly 19 hours ago [-]
It's the first section in the article -

"I have implemented manual memory management via cpp/new and cpp/delete. This uses jank's GC allocator (currently bdwgc), rather than malloc, so using cpp/delete isn't generally needed. However, if cpp/delete is used then memory collection can be eager and more deterministic.

The implementation has full bdwgc support for destructors as well, so both manual deletion and automatic collection will trigger non-trivial destructors."

bluGill 18 hours ago [-]
In the artical - it garbage collects always but if you call delete the garbage collector will be more agressive about cleaning that up.
Jach 22 hours ago [-]
Neat project, I can only marvel at your ability to deal with such madness. But it would be nice to have better C++ interop in higher level languages, there's some useful C++ code out there. I also appreciate the brief mention of Clasp, as I was immediately thinking of it as I was reading through.
superdisk 9 hours ago [-]
Cool stuff for sure, I've been brainstorming making a language that has some of the same characteristics as Jank. I'm jelly that you took the opportunity to work full time on this for a year, wish I could do the same!
netbioserror 21 hours ago [-]
I used Clojure back in the day and use Nim at work these days. Linking in to C is trivially easy in Nim. Happy to see this working for jank, but C++ is...such a nightmare target.

Any chance of Jank eventually settling on reference counting? It checks so many boxes in my book: Simple, predictable, few edge cases, fast. I guess it really just depends on how much jank programs thrash memory, I remember Clojure having a lot of background churn.

Jeaye 20 hours ago [-]
I started with reference counting, but the amount of garbage Clojure programs churn out ends up bogging everything down unless a GC is used. jank's GC will change, going forward, and I want jank to grow to support optional affine typing, but the Clojure base is likely always going to be garbage collected.
fnordsensei 10 hours ago [-]
For a novice, could you elaborate the difference that GC does? Naively, it seems like the only difference would be whether you pay the deallocation fee immediately or later on.

Is there less of a problem when done in bulk if the volume of trash to collect is high enough?

software-is-art 7 hours ago [-]
GCs typically fall into two categories:

1. Reference counting - tracks how many references point to each object. When references are added or removed, the count is updated. When it hits zero, the object is freed immediately. This places overhead on every operation that modifies references.

2. Mark and sweep - objects are allocated in heap regions managed by the GC. Periodically the GC traces from roots (stack, globals) to find all live objects, then frees the rest. Usually generational: new objects in a nursery/gen0 are collected frequently, survivors are promoted to older generations collected less often.

In general reference counting is favoured for predictable latency because you’re cleaning up incrementally as you go. Total memory footprint is similar to manual memory management with some overhead for counting refs. The cost is lower throughput as every reference change requires bookkeeping (see Swift ARC for a good example).

Mark and sweep GCs are favoured for throughput as allocations and reference updates have zero overhead - you just bump a pointer to allocate. When collection does occur it can cause a pause, though modern concurrent collectors have greatly reduced this (see Java G1GC or .NET for good examples). Memory footprint is usually quite a bit larger than manual management.

In the case of Clojure which in addition to being a LISP also uses immutable data structures, there is both object churn and frequent changes to the object graph. This makes throughput a much larger concern than a less allocation heavy language - favouring mark and sweep designs.

almostgotcaught 23 hours ago [-]
i commented on reddit (and got promptly downvoted) but since i think jank's author is around here (and hopefully is receptive to constructive criticism): the CppInterOp approach to cpp interop is completely janky (no pun intended). the approach literally string munges cpp and then parses/interprets it to emit ABI compliant calls. there's no reason to do this except that libclang currently doesn't support any other way. that's not jank's fault but it could be "fixed" in libclang. at a minimum you could use https://github.com/llvm/llvm-project/blob/main/clang/lib/Cod... to emit the code based on clang ast. at a maximum would be to use something like

https://github.com/Mr-Anyone/abi

or this if/when it comes to fruition

https://discourse.llvm.org/t/llvm-introduce-an-abi-lowering-...

to generate ABI compliant calls/etc for cpp libs.

note, i say all this with maximum love in my heart for a language that would have first class cpp interop - i would immediately become jank's biggest proponent/user if its cpp interop were robust.

EDIT: for people wanting/needing receipts, you can skim through https://github.com/compiler-research/CppInterOp/blob/main/li...

Jeaye 23 hours ago [-]
Hey! I'm here and receptive.

I completely agree that Clang could solve this by actually supporting my use case. Unfortunately, Clang is very much designed for standalone AOT compilation, not intertwined with another IR generating mechanism. Furthermore, Clang struggles to handle some errors gracefully which can get it into a bad state.

I have grown jank's fork of CppInterOp quite significantly, in the past quarter, with the full change list being here: https://gist.github.com/jeaye/f6517e52f1b2331d294caed70119f1... Hoping to get all of this upstreamed, but it's a lot of work that is not high priority for me right now.

I think, based on my experience in the guts of CppInterOp, that the largest issue is not the C++ code generation. Basically any code generation is some form of string building. You linked to a part of CppInterOp which is constructing C++ functions. What's _actually_ wrong with that, in terms of robustness? The strings are generated not based on arbitrary user input, but based on Clang QualTypes and Decls. i.e. you need valid Clang values to actually get there anyway. Given that the ABI situation is an absolute mess, and that jank is already using Clang's JIT C++ compiler, I think this is a very viable solution.

However, in terms of robustness, I go back to Clang's error handling, lack of grace, and poor tooling for use cases like this. Based on my experience, _that_ is what will cause robustness issues.

Please don't take my response as unreceptive or defensive. I really do appreciate the discussion and if I'm saying something wrong, or if you want to explain further, please do. For alternatives, you linked to https://github.com/Mr-Anyone/abi which is 3 months old and has 0 stars (and so I assume 0 users and 0 years of battle testing). You also linked to https://discourse.llvm.org/t/llvm-introduce-an-abi-lowering-... which I agree would be great, _if/when it becomes available_.

So, out of all of the options, I'll ask clearly and sincerely: is there really a _better_ option which exists today?

CppInterOp is an implementation detail of jank. If we can replace C++ string generation with more IR generation and a portable ABI mechanism, _and_ if Clang can provide the sufficient libraries to make it so that I don't need to rely on C++ strings to be certain that my template specializations get the correct instantiation, I am definitely open to replacing CppInterOp. From all I've seen, we're not there yet.

rjsw 22 hours ago [-]
I think that some packages that generate Python bindings for C++ use Clang to do it as well.
almostgotcaught 23 hours ago [-]
> which is 3 months old and has 0 stars (and so I assume 0 users and 0 years of battle testing)

ah my bad i meant to link to this one https://github.com/scrossuk/llvm-abi

which inspired the gsoc.

> is there really a _better_ option which exists today?

today the "best in class" approach is swift's which fully (well tries to) model cpp AST and do what i suggested (emitting code directly):

https://github.com/swiftlang/swift/blob/c09135b8f30c0cec8f5f...

Jeaye 22 hours ago [-]
There are upsides to this approach. Coupling Swift's AST with Clang's AST will allow for the best codgen, for sure.

However, the huge downside to this approach, which cannot be overlooked, is that Clang (not libclang) is not designed to be a library. It doesn't have the backward compatibility of a library. Swift (i.e. Apple) is already deep into developing Clang, and so I'm sure they can afford the cost of keeping up with the breaking changes that happen on every Clang release. For a solo dev, I'm not yet sure this is actually viable, but I will give it more consideration.

However, I think that raising alarms at C++ codegen is unwarranted. As I said before, basically any query builder or codegen takes some form of string generation. The way we make those safe is to add types in front of them, so we're not just formatting user strings into other strings. That's exactly what CppInterOp does, where the types added are Clang QualTypes and Decls.

caim 17 hours ago [-]
You right. Always good to remember that Apple was and still is the main company behind LLVM.

Swift was built and its maintained by the same time that worked in LLVM.

And also, Swift has its own fork of LLVM and LLVM has built-in a lot of features designed for swift like calling convention and async transformation.

The amount of features swift has and is releasing at the same time it has its own LLVM version is just not a thing you can do without a lot of money and years of accumulated expertise.

almostgotcaught 13 hours ago [-]
> still is the main company behind LLVM.

lol people really say whatever comes to their mind around here don't they? I'm pretty sure all of the companies associated with these targets would strongly disagree with you

https://github.com/llvm/llvm-project/tree/main/llvm/lib/Targ...

caim 29 minutes ago [-]
- lol people really say whatever comes to their mind around here don't they?

Are you talking about yourself? Because it's clear that you don't understand this.

Apple literally hired Chris Lattner in 2005 and made a team to work on LLVM. After GNU refused to integrate LLVM into GCC,

But Apple saw an opportunity to have its own C/C++ compiler.

To this day, Apple still has the core team behind LLVM.

Anyone can come in, add a feature/target, and then leave the project. But the main LLVM maintainers are hired by or indirectly big tech companies.

almostgotcaught 20 hours ago [-]
> For a solo dev, I'm not yet sure this is actually viable, but I will give it more consideration.

look i'm not trying to shit on your project - i promise - i know calling you out like this publically almost requires a political kind of response (i probably shouldn't have done it). i agree with you that as a solo dev you can't (shouldn't) solve this problem - you have enough on your plate making jank great for your core users (who probably don't really care about cpp).

> As I said before, basically any query builder or codegen takes some form of string generation.

i mean this is a tautology on the level of "everything can be represented as strings". yes that's true but types (as you mention are important) and all i'm arguing is that it's much more robust to start with types and end with types instead of starting with strings and ending with types.

anyway you don't need to keep addressing my complaints - you have enough on your plate.

wk_end 23 hours ago [-]
> the CppInterOp approach to cpp interop is completely janky (no pun intended). the approach literally string munges cpp and then parses/interprets it to emit ABI compliant calls.

So, I agree that this sounds janky as heck. My question is: besides sounding janky as heck, is there something wrong with this? Is it slow/unreliable?

almostgotcaught 23 hours ago [-]
i mean it's as prone to error as any other thing that relies on string munging. it's probably not that much slower than the alternative i proposed - because the trampolines/wrappers are jitted and then reused - but it's just not robust enough that i would ever imagine building a prod system on top of it (eg using cppyy in prod) let alone baking it into my language/runtime.
Jeaye 21 hours ago [-]
> i mean it's as prone to error as any other thing that relies on string munging.

This is misleading. Having done a great deal of both (as jank also supports C++ codegen as an alternative to IR), if the input is a fully analyzed AST, generating IR is significantly more error prone than generating C++. Why? Well, C++ is statically typed and one can enable warnings and errors for all sorts of issues. LLVM IR has a verifier, but it doesn't check that much. Handling references, pointers, closures, ABI issues, and so many more things ends up being a huge effort for IR.

For example, want to access the `foo.bar` member of a struct? In IR, you'll need to access foo, which may require loading it if it's a reference. You'll need to calculate the offset to `bar`, using GEP. You'll need to then determine if you're returning a reference to `bar` or if a copy is happening. Referencing will require storing a pointer, whereas copying may involve a lot more code. If we're generating C++, though, we just take `foo` and add a `.bar`. The C++ compiler handles the rest and will tell us if we messed anything up.

If you're going to hand wave and say anything that's building strings is error prone and unsafe, regardless of how richly typed and thoroughly analyzed the input is, the stance feels much less genuine.

refulgentis 23 hours ago [-]
The delta between the title and the content gave me extreme pause, thanks for sharing that there's, uh, worse problems.

I'm a bit surprised I've seen two articles about jank here the last 2 days if these are exemplars of the technical approach and communication style. Seems like that wouldn't be enough to get on people's radars.

Jeaye 23 hours ago [-]
Which particular delta between the title and the content gave you extreme pause?
refulgentis 22 hours ago [-]
It said "jank is C++", which I assumed would be explaining that jank compiles down to C++ or something similar, i.e. there is a layer of abstraction between jank and C++, but it effectively "works like" C++.

On re-read, I recognize where it is used in the article:

"jank is C++. There is no runtime reflection, no guess work, and no hints. If the compiler can't find a member, or a function, or a particular overload, you will get a compiler error."

I assume other interop scenarios don't pull this off*, thus it is distinctive. Additionally, I'm not at all familiar with Clojure, sadly, but it also sounds like there's some special qualities there ("I think that this is an interesting way to start thinking about jank, Clojure, and static types")

Now I'll riff and just write out the first 3-5 titles that come to mind with that limited understanding:

- Implementing compile-time verifiable C++ interop in jank

- Sparks of C++ interop: jank, Clojure, & verifying interop before runtime

- jank's progress on C++ interop

- Safe C++ interop lessons from jank

* for example, I write a lot of Dart day to day and rely on Dart's "FFI" implementation to call C++, which now that I'm thinking about, only works because there's a code generator that creates "Dart headers" (my term) for the C++ libraries. I could totally footgun and call arbitrary functions that don't exist.

Jeaye 22 hours ago [-]
My reasoning is this:

jank is written in C++. Its compiler and runtime are both in C++. jank can compile to C++ directly (or LLVM IR). jank can reach into C++ seamlessly, which includes reaching into its own compiler/runtime. Thus, the boundary between what is C++ and what is Clojure is gone, which leaves jank as being both Clojure and C++.

Achieving this singularity is a milestone for jank and, I think, is worthy of the title.

quuxplusone 5 hours ago [-]
FWIW, I saw that the title was false (after all, Jank and C++ are two different things), but I assumed it was playing on the snowclone "Are we _X_ yet?" and therefore the blog post was going to be explaining why the answer to "Is Jank C++ yet?" should be "Yes, Jank is C++ now."
actionfromafar 23 hours ago [-]
Given how the world works, that might mean we will all sit and curse Jank instead of cursing Node. :)
xxr 23 hours ago [-]
These recursive initialism PL names are getting out of hand /s
Jeaye 22 hours ago [-]
I've pondered this for a while and I have no idea how jank is a recursive acronym. What're you seeing that I'm not?
eurleif 21 hours ago [-]
Jank's A Native Klojure? :)
xxr 19 hours ago [-]
It’s a joke (hence the “/s”) on the “[PL name] is [words beginning with the rest of the letters of the Pl name]” snowclone. However as time approaches infinity I’m sure it will get a recursive backronym.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 17:31:16 GMT+0000 (Coordinated Universal Time) with Vercel.