NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Borrow Checking, RC, GC, and Eleven Other Memory Safety Approaches (verdagon.dev)
naasking 2 days ago [-]
More people need to read up on C#'s ref's:

https://em-tg.github.io/csborrow/

These kinda-sorta fall under borrow checking or regions, just without any annotations. Then again, Ada/Spark's strategy also technically falls under Tofte-Talpin regions:

https://www.cs.cornell.edu/people/fluet/research/substruct-r...

Someone 2 days ago [-]
Isn’t that a form of “5. a mechanism where a pointer cannot point to an object that is more deeply scoped than itself”?

Also, since it’s an error, I guess it must be different from gcc and clang (and, likely other C compilers) -Wreturn-local-addr warnings (https://www.emmtrix.com/wiki/Clang:Flag/-Wreturn-local-addr), which can have false positives.

What’s the difference? Is the language stricter, disallowing some constructs that are valid, but hard to prove or is the error not always triggered for buggy code?

naasking 1 days ago [-]
> Isn’t that a form of “5. a mechanism where a pointer cannot point to an object that is more deeply scoped than itself”?

That constraint is just a Tofte-Talpin region restriction, eg. lexically scoped regions.

bob1029 2 days ago [-]
Here's an example where ref returns make a big impact:

https://github.com/prasannavl/WinApi

The performance of this interop layer is so close to native it's difficult to argue for doing things the much more painful way.

mastax 2 days ago [-]
Yeah C# is very well designed for gradually introducing low level concepts for performance.
jimbob45 2 days ago [-]
The reverse is arguably the one trait Rust is missing that is holding it back from mass adoption. You can write high-level C# code and seamlessly introduce low-level concepts into it as you see fit. However, although you can write low-level Rust code easily, introducing high-level concepts is very painful and, in many cases, impossible.
akkad33 1 days ago [-]
Also F#
neonsunset 1 days ago [-]
As of now - F# needs more work w.r.t. support for [<Struct; IsByRefLike>] types and byref<'T>s. There's a reason ref lifetime analysis in Roslyn is quite a sophisticated part of its implementation even if most developers aren't aware of it.

F# has other cool features like `inline` bindings and [<IlineIfLambda>] function attributes but I found it more difficult to work with for systems programming when attempting to stay within safe constructs.

But if you don't need byrefs, ref structs and spans, then regular pointer-based or even array-based code is quite pleasant to write, for example https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

With that said - F# is an incredible language, but for systems programming you are likely to get better results in C#. I hope eventually someone contributes subsequent spec and compiler work to change this (and if you think you have time and could try it - please do! it's a nice small community).

akkad33 1 days ago [-]
Doesn't F# have byrefs, inrefs and outrefs. How are they different from C# refs? Asking as someone with surface level knowledge of both languages https://learn.microsoft.com/en-us/dotnet/fsharp/language-ref...
chrismorgan 2 days ago [-]
> Curséd

With an acute accent, that should be roughly /ˌkɜːrˈseɪd/ “curse-ay-d”. (Think “café” or “sashayed”.)

The stylised pronunciation being evoked is roughly /ˈkɜːrˌsɛd/, “curse-ed”, and would be written with a grave accent: “cursèd”.

bloppe 2 days ago [-]
Accents mean different things in different languages. I'm assuming the author was invoking Spanish, where the acute accent impacts syllable stress but not pronunciation.
rzzzt 2 days ago [-]
Italians will get close to the first pronunciation both ways, I think. The Zalgo line noise is an international way of signaling the level of curse in writing.
tialaramex 2 days ago [-]
The list gets very woolly by the end. CHERI exists (though not at volume), Cornucopia Reloaded is a research paper, "plus some techniques to prevent use-after-free on the stack" is entirely hand waving.

It is really good as food for thought though.

willvarfar 2 days ago [-]
Meta comment, but I really like the formatting of the blog post!

It reminds me of the early days of the web, when text was king and content was king. I particularly like the sidenotes in the margins approach.

(Hope the author sees this comment :) Hats off)

_kb 2 days ago [-]
Side notes are a great layout for most deeper reads.

There's some great tooling for that via https://edwardtufte.github.io/tufte-css/ and https://tufte-latex.github.io/tufte-latex/.

brabel 2 days ago [-]
Yeah the author always uses this in his blog about his language, Vale (which is very unfortunately not being developed anymore, at least for now). The other posts are also worth a read: https://vale.dev/
ivell 2 days ago [-]
He now works on Mojo, to bring linear types into Mojo.
dgan 2 days ago [-]
I am sorry, I am maybe dumb but i can't see the 14 techniques been listed anywhere? Where do i even click?
hawski 2 days ago [-]
You need to read the post.

> Wait a minute, this list goes to 17, yet the intro only mentions 14! I actually did that because a couple might overlap and a couple of them are half-approaches, and that last one is just here for fun. Besides, as I learn more approaches and add them to the list, the title will get more and more out of date anyway.

bitbasher 2 days ago [-]
I'm kinda torn. It seems there are only three approaches.

1. laissez-faire / manual memory management (c, c++, etc)

In this approach, the programmer decides everything.

2. dictatorship / garbage collection (java, go, etc)

In this approach, the runtime decides everything.

3. deterministic / lifetime memory management (rust, c with arenas, etc)

In this approach, the problem determines everything.

ryao 1 days ago [-]
There is another option, which is to use a sound static analyzer that can prove the absence of memory safety issues like astree, and fix things that cause it to complain until it stops complaining:

https://www.absint.com/astree/index.htm

For those who think static analyzers cannot do that, notice the word “sound”. This is a different type of static analyzer than the more common ones that do not catch everything.

Sadly, there is no open source option that works across a broad range of software. NASA’s IKOS is open source for example, but it does not support multithreading and some other things that I do not recall offhand, which makes it unable to catch all memory safety bugs in software using the features that it does not support. For now, people who want to use sound static analyzers need to use closed source tools or restrict themselves to a subset of what C/C++ can do so they can use IKOS:

https://github.com/NASA-SW-VnV/ikos

voxl 1 days ago [-]
Even in their own list of features "Memory Safety" does not pop up, and none of the listed features indicate to me that they would entail memory safety. Academics aren't publishing droves of work on separation logic for a problem that can be solved by 1980s static analyzers.
ryao 18 hours ago [-]
They say that they can prove the absence of classes of bugs that make up memory safety:

* out-of-bounds array indexing,

* erroneous pointer manipulation and dereferencing (NULL, uninitialized and dangling pointers),

* read access to uninitialized variables,

NIST gave them a glowing review:

https://nvlpubs.nist.gov/nistpubs/ir/2020/NIST.IR.8304.pdf

It is possible to make sound static analyzers that can prove code to be free from memory safety bugs, but it is difficult and you need to implement checks that either complain or mathematically prove the absence of each class to do it.

These tools have been used for years in the aviation and nuclear industries, but almost nobody outside of those industries knows anything about them. If others had broader awareness of the existence of these tools, we could get open source equivalents and memory safety issues would be a thing of the past in C/C++ code.

Finally, your 1980s remark reveals enormous ignorance of what the field of formal methods has produced. Astree was not available in the 1980s. It took over 40 years of work to make and only became available about 20 years ago. C++ support took much longer for it to add, with it only adding it around 6 years ago if my recollection of the public documentation is correct. Some other things in this space are the Polyspace Code Prover and Frama-C.

voxl 9 hours ago [-]
I'm published in formal methods and see plenty of papers on static analysis of C. They ALL fail against a well designed type system.
mgaunard 2 days ago [-]
The fact that re-using a slot for a different object of the same type is considered a memory safety technique is ridiculous.
obl 2 days ago [-]
It is not ridiculous at all. Those things have pretty precise definitions and type segregation absolutely does remove a bunch of soundness issues related to type confusion.

You can think of it as the rather classic "Vec of struct + numeric IDs" that is used a lot e.g. in Rust to represent complex graph-like structures.

This combined with bound checking is absolutely memory safe. It has a bunch of correctness issue that can arise due to index confusion but those are not safety issues. When combined with some kind of generational counters those correctness issue also go away but are only caught at runtime not at compile time (and they incur a runtime cost).

Rust's memory safety is about avoiding liveness issues (that become type confusions since all memory allocators will reuse memory for different types), nothing more, nothing less.

gpderetta 23 hours ago [-]
> but those are not safety issues.

there are not memory safety issues. But they definitely can lead to security issues with some sort of confused deputy attack.

For example a capability based system that relied on just this form of memory safety would be pointless.

Of course this can be mitigated by adding version counters to objects and object selectors.

jandrewrogers 2 days ago [-]
FWIW, arrays of structs + integer handles is the primary way objects are represented in performance-engineered C++.
ptx 1 days ago [-]
Maybe we need to introduce the term "value safety" to complement "type safety".

If a language is merely type-safe, then it might be OK to silently replace a value with a different one of the same type, sure, fine. Who cares if the program transmits the wrong message to the wrong recipient as long as it's definitely some message and some recipient?

But a value-safe language, I suggest, is one that doesn't pull this kind of switcheroo.

jcarrano 1 days ago [-]
Reminds me of a video I saw about "cloning" on Super Mario 64[1]. SM64 uses slots and one can mess around with slot deletions to get random objects.

[1] https://www.youtube.com/watch?v=X2AhyDI58-I

AlotOfReading 2 days ago [-]
It absolutely is. Some embedded mallocs do this under the hood against a predefined linker table. Genuinely helps even with the restrictions it imposes.
eddd-ddde 2 days ago [-]
Are you implying that it is unsafe? Or that it is unrelated to safety properties?
mgaunard 2 days ago [-]
It just shows memory safety is a joke, simply replacing a class of bugs with another.
IshKebab 1 days ago [-]
It isn't a joke. With memory safety bugs the value of an object can unexpectedly be any bit pattern. That breaks the assumptions of basically every language and leads to pretty much anything happening.

If you have an array of objects of the same type and you just pick the wrong one, then the data still has to be a valid bit pattern. Yes it might still be a security bug, but it's much less likely because you aren't completely subverting the language.

Surely you don't think all bugs are the same because they are all bugs?

mgaunard 22 hours ago [-]
It takes a lot more for a program to be correct than having valid bit patterns.

To begin with, the whole point of classes is to maintain invariants. Guaranteeing that a location in memory matches the valid bit patterns of its members is far from sufficient.

IshKebab 20 hours ago [-]
> It takes a lot more for a program to be correct than having valid bit patterns.

Obviously. I never said otherwise. What's your point?

adamrezich 2 days ago [-]
“Safety” is a very overloaded English word with strong connotations, and the popularization of it in the context of “memory safety” has been good in some ways, but has really poisoned the discourse in many others.
hawski 2 days ago [-]
That is very informational. Thank you.

I am interested in Vale and it feels very promising, though because my interested in bootstrapping I don't like that it is written in Scala. I know, that is shallow, but that's a thing that limits my enthusiasm.

If you are like me and don't like jumping around between notes and text and you prefer to read the notes anyway, here is a little snippet you can run in Web Inspector's Console:

  document.querySelectorAll(".slice-contents a[data-noteid]").forEach(e => {document.querySelectorAll('.slice-notes [data-noteid="' + e.attributes["data-noteid"].nodeValue + '"] p').forEach(p => {p.style.fontSize = 'smaller'; e.parentNode.insertBefore(p, e)}); e.remove() })
It will replace note links with notes themselves making them smaller, because they will not always fit smoothly.
alexisread 2 days ago [-]
dang 1 days ago [-]
Thanks! Macroexpanded:

Borrow Checking, RC, GC, and the Eleven (!) Other Memory Safety Approaches - https://news.ycombinator.com/item?id=41974185 - Oct 2024 (1 comment)

Borrow Checking, RC, GC, and the Eleven () Other Memory Safety Approaches - https://news.ycombinator.com/item?id=40146615 - April 2024 (68 comments)

1 days ago [-]
lilyball 1 days ago [-]
I don't see any mention of epoch-based garbage collection (see crossbeam https://docs.rs/crossbeam/latest/crossbeam/epoch/index.html). Generational References sounds like a related concept but it's not the same. I'm also surprised nobody's mentioned that one lance corporal goat yet.
eklitzke 1 days ago [-]
I don't understand why they say that reference counting is "slow". Slow compared to what? Atomic increments/decrements to integers are one of the fastest operations you can do on modern x86 and ARM hardware, and except in pathological cases will pretty much always be faster than pointer chasing done in a traditional mark and sweep VMs.

This isn't to say reference counting is without problems (there are plenty of them, inability to collect cyclical references being the most well known), but I don't normally think of it as a slow technique, particularly on modern CPUs.

gpderetta 23 hours ago [-]
Atomic reference counting per se is fairly slow compared to other simple operations [1]. But the biggest issue with reference counting is that it doesn't scale well in multithreaded programs: even pure readers have to write to shared memory locations. Also acquiring a new reference from a shared atomic pointer is complex and need something like hazard pointers or a lock.

[1] an atomic inc on x86 is typically ~30 clock cycles, doesn't really pipeline well and will stall at the very least other load operations.

nahuel0x 2 days ago [-]
It's surprising to see an article with such a large encompassing of different techniques, hybrid techniques and design interactions with the type system, but is more surprising that a whole dimension of memory (un)management was left out: memory fragmentation
Quekid5 2 days ago [-]
It's probably because fragmentation isn't a safety issue. (In the sense of 'safety' being discussed here.)
galangalalgol 2 days ago [-]
It doesn't create UB, but it is something safety critical software has to address.
Quekid5 2 days ago [-]
... which is why I had that little bit at the end there.
DanielHB 2 days ago [-]
I am not experienced with rust and borrow checkers, but my impression is that borrow checkers also statically ensures thread/async safety while most other memory safety systems don't. Is this accurate?
kibwen 2 days ago [-]
The borrow checker is only one component of the means by which Rust statically enforces thread safety. If you design a language that doesn't allow pointers to be shared across threads at all, then you wouldn't need a borrow checker. Likewise if you have an immutable-only language. What's interesting about Rust is that it actually supports this safely, which is still unbelievable sometimes (like being able to send references to the stack to other threads via std::thread::scoped).
nemaar 2 days ago [-]
> If you design a language that doesn't allow pointers to be shared across threads at all, then you wouldn't need a borrow checker.

Is that actually true? I'm pretty sure you need the borrow checker even for single threaded Rust to prevent use after frees.

chlorion 2 days ago [-]
Theres even more than just UAFs that you have to worry about in a single threaded context, but yes you are correct.

Here's a good post that talks about why shared mutability even in single threaded contexts is dangerous: https://manishearth.github.io/blog/2015/05/17/the-problem-wi...

kibwen 2 days ago [-]
Let me clarify: "wouldn't need a borrow checker for the specific requirement of ensuring thread safety". Clearly the borrow checker is quite useful in single-threaded contexts on its own. :P The point is just that it's perfectly valid to have a language that doesn't have "reference semantics" at all.
jerf 2 days ago [-]
You may have to define your terms more carefully and extend past "doesn't allow pointers", but it can be true. Look to Erlang; the key to Erlang's safety isn't actually its immutable values, but the fact that no messages can travel between the various "processes" (threads in conventional parlance, just not "OS threads") that carry any sort of reference or pointer is what makes it so Erlang is safe without any concept of borrow checking. It also semantically copies all values (there are some internal optimizations for certain types of values that make it so they are technically not copied, but at the language level, it's all copies) and each process is GC'd, so in terms of the Rust borrow checker I mention this only in the context of cross-thread sharing safety.

But in general, if threads can't communicate any sort of pointer or reference that allows direct, unmediated access to the same value that some other thread will see, there's no need for a "borrow checker" for thread safety.

(Note that "but what if I have a thing that is just a token for a value that people can potentially read and write from anywhere?" is not an exception to this, because in this context, such access would not be unmediated. This access would still require messages to and from the "holding" process. This sort of safety won't stop you from basically deliberately re-embedding your own new sorts of races and unsafe accesses in at the higher level of your own code, it just won't be a data race in the same way that multiple threads reading and writing through the same pointer is a data race at the lower level. The main solution to this problem is "Doctor, it hurts when I do this." -> "Don't do that.")

PeterWhittaker 2 days ago [-]
The first part - that the Rust borrow checker and overall memory model ensures thread/async safety - is true. I cannot speak to the second part - that other systems don't have this assurance.
tialaramex 2 days ago [-]
Just the borrowck isn't enough, you need the Send and Sync marker traits. Marker traits are something lots of languages could do but they'd be useless (or always unsafe) without a lot of other machinery Rust had already.
kibwen 2 days ago [-]
And Sync might just be there for performance, I think you could get away with only Send if you didn't mind some additional copying?
DanielHB 2 days ago [-]
> that other systems don't have this assurance

My understanding is that most (all?) GC languages are memory safe, but do not ensure statically verifiable thread safety at all. Like Java, Go, C#, Python, etc.

pizlonator 2 days ago [-]
The way you make garbage collection deterministic is not by doing regions but by making it concurrent. That’s increasingly common, though fully concurrent GCs are not as common as “sorta concurrent” ones because there is a throughput hit to going fully concurrent (albeit probably a smaller one than if you partitioned your heap as the article suggests).

Also, no point in calling it “tracing garbage collection”. Its just “garbage collection”. If you’re garbage collecting, you’re tracing.

roetlich 2 days ago [-]
> Also, no point in calling it “tracing garbage collection”.

You're against more explicit naming just for the sake of it? In the literature reference counting is also referred to as a type of garbage collection, and doesn't involve tracing. If you talking about a specific context you can probably drop the "tracing", but in a general article like this it would just be very confusing?

This way, someone can google "tracing garbage collection", and will find the relevant wikipedia article: https://en.wikipedia.org/wiki/Tracing_garbage_collection

pizlonator 2 days ago [-]
The literature does not always put the “tracing” in front of the “garbage collection”.

For example, nobody says that Objective-C is garbage collected just because it has ARC. Nobody says that C++ is garbage collected even though shared_ptr is widespread. And systems that do tracing GC just call it GC (see for example https://www.oracle.com/webfolder/technetwork/tutorials/obe/j...)

To think clearly about the tradeoff between GC and RC it’s important to acknowledge the semantic differences:

- GC definitely collects dead cycles.

- RC knows exactly when objects die, which allows for sensible destructor semantics and things like CoW.

- it’s possible to use RC as an optimization in a GC, but then you end up with GC semantics and you still have tracing (hence: if it’s got tracing, it’s a garbage collector).

It’s a recent fad to say that RC is a kind of GC, but I don’t think it ever took off outside academia. Folks who write GCs call them GCs. Folks who do shared_ptr or ARC say that they don’t use GC.

And its good if this fad dies because saying that RC is a kind of GC causes folks to overlook the massive semantic elephant in the room: if you use a GC then you can’t swap it for RC because you’d leak memory (RC would fail to delete cycles), and if you use RC and swap it for a GC then you’d leak resources (your destructors would no longer get called when you expect them to).

On the other hand, it is possible to change the guts of an RC impl without anyone noticing. And it’s possible to change the guts of a GC while preserving compatibility. So these are really two different worlds.

roetlich 2 days ago [-]
> The literature does not always put the “tracing” in front of the “garbage collection”.

Not always, but often enough that an introductory article that presents an overview of different memory managment techniques should maybe use the longer name to avoid confusion.

And I kinda agree with you, using the name "garbage collection" for RC doesn't really make sense, there is no metaphorical garbage truck driving around to collect your unused memory. :)

What's your opinion on the term "RC with cycle detection" that some use for things like Python's GC?

> And it’s possible to change the guts of a GC while preserving compatibility.

Moving to a conservative GC might also introduce memory leaks, if you're unlucky. But yes, "tracing" gc and rc obeviously behave very differently, and have very different performance considerations.

pizlonator 2 days ago [-]
> Not always, but often enough that an introductory article that presents an overview of different memory managment techniques should maybe use the longer name to avoid confusion.

Referring to garbage collection as tracing garbage collection creates more confusion and should be avoided.

It confuses folks into thinking that there is some garbage collection that isn’t tracing. There’s no such thing.

> What's your opinion on the term "RC with cycle detection" that some use for things like Python's GC?

Depends on how you detect cycles. Python uses a garbage collector. Therefore I would say that python has a GC and is a GC’d language.

> Moving to a conservative GC might also introduce memory leaks, if you're unlucky.

Folks who adopt conservatism in production do so only if they have a story for avoiding those leaks. (That’s what we did in JavaScriptCore.)

> But yes, "tracing" gc and rc obeviously behave very differently, and have very different performance considerations.

Just call it “GC” and everyone will know what you mean. No need to be a contrarian and put “tracing” in front.

And it’s not perf considerations if it’s the difference between your program running at all and crashing. Failing to collect all cycles as RC does would cause a program written in a GC’d language to simply crash if it ran for more than just a short while. Failing to invoke destructors the way RC’d programs expect, which would happen if you tried to switch to GC, will cause observably different behavior in addition to possible performance issues.

tonyg 2 hours ago [-]
> It confuses folks into thinking that there is some garbage collection that isn’t tracing. There’s no such thing.

It is standard to consider reference counting as garbage collection.

Bacon, D.F., Cheng, P. and Rajan, V.T. 2004. A unified theory of garbage collection. ACM SIGPLAN Notices. 39, 10 (Oct. 2004), 50–68. DOI: https://doi.org/10.1145/1035292.1028982

Abstract:

Tracing and reference counting are uniformly viewed as being fundamentally different approaches to garbage collection that possess very distinct performance properties. [...] Using this framework, we show that all high-performance collectors (for example, deferred reference counting and generational collection) are in fact hybrids of tracing and reference counting.

roetlich 2 days ago [-]
> And it’s not perf considerations if it’s the difference between your program running at all and crashing.

Yeah, that's what I meant to include by "behave very differently". I don't think we disagree on anything technical here. The problem is if you are currently googling for garbage collection you will mostly get garbage results. Here's duckduckgo to avoid my search bubble: https://duckduckgo.com/?q=garbage+collection+programming

The first result is the wikipedia article: https://en.wikipedia.org/wiki/Garbage_collection_%28computer... It's pretty bad, under "Strategies" it lists three: Tracing, Reference Counting and Escape Analysis. I'm sure these three are similar things.

The second result is this blog post, also listing rereference counting as gc: https://www.freecodecamp.org/news/a-guide-to-garbage-collect...

And the third result looks okay. Searching for "tracing garbage collection" has better results. The text in question already uses "gc" most of the time, and has a footnote saying:

> By "garbage collection", we're referring to tracing garbage collection.

I think that's as clear as it gets, without going on rant about the names of things. You are clearly an expert in garbage collectors, but most people in the target audience of that article are not. The article compares the differences between rc and gc. If someone then goes and reads the wikipedia articles about either of those they will be very confused because wikipedia will tell them rc is gc. A "fad" like this can't be undone, once a usage of a word becomes this popular you can't undo it.

Okay, sorry, this was too long, and we agree to like 99% anyway. Have a nice day! :)

pizlonator 2 days ago [-]
That reminds me of the time I went on a first date and she asked, "so what do you do exactly?" and I said I work on "garbage collection". You should have seen her face!
azornathogron 2 days ago [-]
Using concurrent GC has various advantages but in no way makes it deterministic.
pizlonator 2 days ago [-]
True, not by itself.

But concurrent GC is the basis for making deterministic GC, since it gives you the option of scheduling GC work whenever you like rather than pausing the world.

Some concurrent GCs are also deterministic while others aren’t. I’ve written both kinds.

gpderetta 22 hours ago [-]
you have some very non-standard definition of determinism.
pizlonator 17 hours ago [-]
How so?
jakewins 2 days ago [-]
Do you have any recommended reading material on this?

Intuitively it feels like making it concurrent should do the opposite of making GC deterministic! I’d love to read something showing that intuition is wrong

the__alchemist 2 days ago [-]
After pondering, my single favorite capability of rust is this:

  fn modify(val: &mut u8) {
      // ...
  }
No other language appears to have this seemingly trivial capability; their canonical alternatives are all, IMO, clumsier. In light of the article, is this due to Rust's memory model, or an unrelated language insight?
constantcrying 2 days ago [-]
How is a mutable reference as an argument to a function in any way unique? C++ has mutable references as arguments to functions.
pornel 8 hours ago [-]
Because it's not merely mutable, it's exclusive. You get a static guarantee that, for as long as you can use this reference, this is the only reference in the entire program that can mutate this memory.

This is automatically thread-safe, without any locks. It's guaranteed that there can't be any side effects that could affect this memory, no matter what code you call. You don't need any defensive coding copying the memory just in case. It optimizes well, because it's certain that it won't overlap with any other region.

C++ doesn't have that kind of strong no-alias guarantee. Even memory behind const pointers can be mutated by something else at distance. The closest equivalent is C's restrict pointers, but they're more coarse-grained, and aren't checked by the compiler.

the__alchemist 2 days ago [-]
You're right; you can do this in C++, although the patterns I've seen usually involve pointers.

There is no equivalent in Python, Javascript, Typescript, Java, or Kotlin.

constantcrying 2 days ago [-]
>I've seen usually involve pointers.

If only C++ had a thing called "references", which conveniently also used the "&" symbol and are a way to do exactly this without pointers.

https://en.cppreference.com/w/cpp/language/reference

the__alchemist 2 days ago [-]
I stand corrected on C++. Can you think of any other languages? Of the ones I listed, I use all of, and am continuously frustrated by this limitation.
constantcrying 2 days ago [-]
>Can you think of any other languages?

Yes. E.g. every language where you can pass pointers can do this. Reference are a construct around pointers.

>Of the ones I listed, I use all of, and am continuously frustrated by this limitation.

Why? This is only a limitation for a few basic types. All the language you mention allow you to modify objects passed to a function.

If it ever were any issue though you can always return the modified object and overwrite the original.

a = modify(a)

works even if you can only pass a value.

In practice this should never be a limitation, certainly I have never encountered it as one.

the__alchemist 2 days ago [-]
I challenge you to write the function signature I posted in Python, Javascript, or Kotlin; I'm confident you will not be able to, and the alternatives will be more provincial and unergonomic.

Hint: Python and Javascript will use the type passed to determine mutability. Kotlin will let you pass a mutableStateFlow etc that can be used for, e.g., top-level data structure, but not an arbitrary variable. (e.g. a data structure field, or free variable).

The most popular languages, outside of C and C++, cannot pass pointers. And, raw pointers are not ideal; they lose type safety, memory safety etc.

There are always canonical patterns for a given language to write a function like I did. I find them to be universally more complicated and less explicit/versatile than what I posted; they require code to permeate beyond the function signature.

constantcrying 2 days ago [-]
It works the moment you wrap the int in a class.

class A:

    a = 0
def modify (v):

    v.a = 2
a1 = A()

a1.a = 1

print(a1.a)

modify(a1)

print(a1.a)

the__alchemist 2 days ago [-]
You're correct re the class wrapper (HN nesting limit). Can you see why that solution is complicated, and not versatile? (hint: This limitation will cause ripples of complication throughout your data structures)
constantcrying 2 days ago [-]
I explicitly excluded primitive types in my previous comment.

I have never seem a codebase where this was an issue. The most important concept in programming is defining your data structures. Professional programs almost never operate on primitive data structures, but always on larger structures, like classes or structs.

This is why this is only a problem for you. I don't want to be insulting, but you need to brush up on your data structures. The need to pass a modifiable data structure almost never arises, the idiom in your OP is extremely uncommon and would likely not pass any code review.

constantcrying 2 days ago [-]
You don't need to wrap fields. You just pass the classes.

Think about why nobody but you is inconvenienced by this. I am sorry, this is a you problem, because you are violating common idioms of the language.

There is basically never a reason not to pass your data structure to a function, which is why what you are doing is an incredibly niche functionality.

Also functions should be particular to data structure. That is why types exists.

the__alchemist 2 days ago [-]
General functions are useful.

It is not a me problem, which is why Rust has fixed this. You are arguing that functions which mutate parameters generally are not useful, which isn't true. You have implied that I have a lack of knowledge of data structures, my code wouldn't pass code review. Reflect: You are proposing it is preferable to have a simple function accept a whole data structure when it may only act on one or more of its fields, and may be used in other places. That is a semantic misdirection, and provincialization.

the__alchemist 2 days ago [-]
You have missed the point: Functions should not need to be provincial to a specific data structure. Nor should data structures include field wrappers to subvert a language's sloppy mutation semantics.

The point is that with mutable references, you can modify only the part of the struct you need to, using a general function. You can use this same function to modify a different part of the structure, or a different structure, or a free variable.

Your inference that preferring mutable references to wrapper structs implies insufficient knowledge of data structures is puzzling and incorrect.

diath 1 days ago [-]
Most low-level languages support this, references in C++, pointers in C, ref keyword in C# and Dlang, and so on. It's mostly higher-level languages that do not support such things because it's in the VM semantics that dictate how objects are passed (i.e. in Lua tables are passed by reference, non-table values are passed by copy).
renox 2 days ago [-]
Ada has inout parameters.
andrewstuart 2 days ago [-]
I’d love to see a language that kept everything as familiar as possible and implement memory safety as “the hard bit”, instead of the Rust approach of cooking in multiple different new sub languages and concepts.
pornel 2 days ago [-]
Safety is not an extra feature a'la carte. These concepts are all inter-connected:

Safety requires unions to be safe, so unions have to become tagged enums. To have tagged enums usable, you have to have pattern matching, otherwise you'd get something awkward like C++ std::variant.

Borrow checking works only on borrowed values (as the name suggests), so you will need something else for long-lived/non-lexical storage. To avoid GC or automatic refcounting, you'll want moves with exclusive ownership.

Exclusive ownership lets you find all the places where a value is used for the last time, and you will want to prevent double-free and uninitialized memory, which is a job for RAII and destructors.

Static analysis of manual malloc/realloc and casting to/from void* is difficult, slow, and in many cases provably impossible, so you'll want to have safely implemented standard collections, and for these you'll want generics.

Not all bounds checks can be eliminated, so you'll want to have iterators to implement typical patterns without redundant bounds checks. Iterators need closures to be ergonomic.

…and so on.

Every time you plug a safety hole, it needs a language feature to control it, and then it needs another language feature to make this control fast and ergonomic.

If you start with "C but safe", and keep pulling that thread, nearly all of Rust will come out.

AlotOfReading 2 days ago [-]
I've experienced that frustration myself several times and tried to do "rust but simpler". I just recently failed attempt #4 at not reinventing rust but worse. Attempt #2 ended with Erlang but worse, which was a pleasant surprise.
phicoh 2 days ago [-]
As a long time C programmer I like Rust because it combines two things from C that are important to me (low runtime overhead, no runtime system required) with a focus on writing correct programs.

Memory safety is just one aspect where the compiler can help making sure a program is correct. The more the compiler helps with static analysis, the less we need to rely on creating tests for edge cases.

zamalek 2 days ago [-]
> Memory safety is just one aspect

I feel as though not enough attention is given to how std is designed. For example: [u8], str, Path, and OsStr may be confusing at first, but when you understand why they are there any other approach feels icky. std guides you down a path of caring about things that really should matter (at least if you're only unwrapping provably safe values).

Have you considered what happens if not-utf8 data winds up in an environment variable that you are writing to stdout? What if it contains malicious VT commands?

shiomiru 2 days ago [-]
> Have you considered what happens if not-utf8 data winds up in an environment variable that you are writing to stdout? What if it contains malicious VT commands?

Unless you're talking about terminal bugs in parsing invalid UTF-8 - and parsing invalid UTF-8 is easier than rendering valid UTF-8 - VT commands are UTF-8 compatible. You just need to embed an ASCII escape character.

adgjlsfhk1 2 days ago [-]
yeah I think this is an area where rust (and python) just get it wrong. files, the Internet and input devices can all give you invalid Unicode. IMO it's better to have a primary string type that includes invalid Unicode since most algorithms will handle it correctly anyway, and the ones that won't can pretty clearly check and throw errors appropriately (especially since very few algorithms work correctly for all of Unicode in the first place)
dwattttt 1 days ago [-]
There's a primary type that holds invalid utf8; it's [u8]. If you want a string from that, you try turn it into a string, and deal with the errors then.

If an algorithm works for invalid Unicode, it should probably be an algorithm on bytes, not strings.

nicoburns 2 days ago [-]
Familiar to whom? I came from a JavaScript background, and Rust's syntax and "functional lite" style felt very familiar.
ramon156 2 days ago [-]
Tbf I would already consider a different language when it has al the nice syntax sugar and design choices of Rust. I like almost every choice they made, and I miss things like `if let Some` or unwrapping in other languages. It's just not the same
frou_dh 2 days ago [-]
"Familiar" is subjective so it's not really something to hang your hat on.
brabel 2 days ago [-]
The author of the post is trying pretty much that with his language, Vale.
karmakurtisaani 2 days ago [-]
Isn't that just C/C++?
chrismorgan 2 days ago [-]
As an experienced Rust developer, I have absolutely no idea what you mean by this. Could you write a little more about what you have in mind, and even what you mean by sub-languages and concepts in Rust?
xmcqdpt2 2 days ago [-]
Not a fan of the framing of the article. Firstly, there are millions of Mayans alive today,

https://en.wikipedia.org/wiki/Maya_peoples

and secondly, the reason why the pre-Colombian cultural texts and script are not in use today, even by the people who speak the 28 Mayan languages currently in use, is because of genocide by Columbus and those that followed. The Catholic church destroyed every piece of Mayan script they could get their hands on.

The article reads like the author is not aware of these basic facts of American geography and history.

constantcrying 2 days ago [-]
I thought it was just annoying to read. Irrelevant analogies never helped me understand anything.
4ad 2 days ago [-]
> Interaction nets are a very fast way to manage purely immutable data without garbage collection or reference counting.[...] HVM starts with affine types (like move-only programming), but then adds an extremely efficient lazy .clone() primitive, so it can strategically clone objects instead of referencing them.

This is wrong, Interaction nets (and combinators) can model any kind of computational systems, including ones that use mutation. In fact, ICs are not really about types at all, although they do come from a generalization of Girard's proofs nets, which came from work in linear logic.

The interesting thing about ICs is that they are beta-optimal (any encoding of a computation will be done in the minimum number of steps required -- there is no useless work being done), and maximum-parallel with only local synchonization (all reduction steps are local, and all work that can be parallelized will be parallelized).

Additionally ICs have the property that any encoding of a different computational system in ICs will preserve the asymptotic behavior of all programs written for the encoded computational system. In fact, ICs are the only computational system with this property.

Interaction nets absolutely require garbage collection in the general sense. However, interaction combinators are linear and all garbage collection is explicit (but still exists). HVMs innovation is that by restricting the class of programs encoded in the ICs you can get very cheap lambda duplication and eschew the need for complex garbage collection while also reducing the overhead of implementing ICs on regular CPUs (no croissants or brackets, see Asperti[1] for what that means).

Having a linear language with the above restriction allows for a very efficient implementation with a very simple GC, while maximizing the benefits of ICs. In principle any language can be implemented on top of ICs, but to get most benefits you want a language with these properties. It's not that HVM starts with affine types and an efficient lazy clone operation, it's that a linear language allows extremely efficient lazy cloning (including lambda cloning) to be implemented on top of ICs, and the result of that is HVM.

> The HVM runtime implements this for Haskell.

This is very wrong. HVM has nothing to do with Haskell. HVM3 is written in C[2], HVM2 has three implementations, one in C[3], one in Rust[4], and a CUDA[5] one. HVM1 was just a prototype and was written in Rust[6].

HOC[7], the company behing HVM provides two languages that compile to HVM, Bend[8], and Kind[9]. Bend is a usual functional language, while Kind is a theorem prover based on self types.

Haskell is not involved in any of these things except that the HVM compiler (not runtime) is written in Haskell, but that is irrelevant, before Haskell it used to be written in TypeScript and then in Agda (Twitter discussion, sorry, no reference). It's an implementation detail, it's not something the user sees.

Please note that HVM adds some stuff on top of ICs that makes it not strictly beta-optimal, but nevertheless the stuff added is useful in practice and the practical downgrade from theoretical behaviour is minimal.

[1] Andrea Asperti, The Optimal Implementation of Functional Programming Languages, ISBN-13: 978-0060815424

[2] https://github.com/HigherOrderCO/HVM3/blob/main/src/HVML/Run...

[3] https://github.com/HigherOrderCO/HVM/blob/main/src/hvm.c

[4] https://github.com/HigherOrderCO/HVM/blob/main/src/hvm.rs

[5] https://github.com/HigherOrderCO/HVM/blob/main/src/hvm.cu

[6] https://github.com/HigherOrderCO/HVM1

[7] https://higherorderco.com

[8] https://github.com/HigherOrderCO/bend

[9] https://github.com/HigherOrderCO/kind

immibis 1 days ago [-]
MMM++ is a variation of standard malloc/free. You can still UAF, but only to another object of the same type, which may or may not prevent an exploit.

Something that's missing is full-on formal verification where you write unrestricted C code and then mathematically prove it doesn't have any bugs. Nobody does this because proving a C program is correct is harder than mining a bitcoin block by hand, but it's useful to anchor one end of the safety/freedom spectrum. Many other approaches (such as borrow checking) can also be viewed as variants of this where you restrict the allowed program constructs to ones that are easier to prove things about.

westurner 2 days ago [-]
nemetroid 2 days ago [-]
No mention of RCU?
gpderetta 22 hours ago [-]
RCU, despite the name, is indeed a reclamation algorithm, but not a general one. I.e. you would use RCU (or some other deferred reclamation algorithm like hazard pointers) for specific data structures when you do not have generalized garbage collection.

A generalized RCU is just a tracing GC.

nemetroid 13 hours ago [-]
The part where the article limits itself to general techniques eludes me.
amelius 2 days ago [-]
I like many of the ideas of Rust, but I still think it is an unsuitable language for most projects.

The problem is that it is very easy to write non-GC'd code in a GC'd language, but the other way around it is much much harder.

Therefore, I think the fundamental choice of Rust to not support a GC is wrong.

3836293648 2 days ago [-]
You've got this one wrong. Rust is designed for a specific use case. Most projects are not that use case. Therefore the choice to use Rust is wrong.

If GC is an option and you want all the nice parts of Rust, use OCaml

pjmlp 2 days ago [-]
If Go designers weren't so much anti-modern language design, in many scenarios where people are rushing to do RIIR, they would be better served with Go, if those modern features were part of the language.

Having said that, there are still OCaml (as you noted), Haskell, .NET languages with Native AOT, JVM languages with GraalVM/OpenJ9, D, Nim, Swift, ....

And if one wants to get fancy with type systems, Idris, Dafny, FStar,..

in-pursuit 2 days ago [-]
I think OPs general point, although maybe not what they stated is correct: it’s easy to write GC’d code. It’s “easy” to write code with manual memory management. It’s “easy” to write RC code. But it’s hard to write borrow checker code. And that will probably limit adoption, even though the goals of Rust are good.
amelius 2 days ago [-]
> If GC is an option and you want all the nice parts of Rust, use OCaml

So are you saying it would be possible to use a hypothetical "non-GC-enabled" OCaml compiler that complains if GC'd code is invoked/generated, and it would be a similar experience as using Rust?

johnnyjeans 2 days ago [-]
No, simply because GC isn't an optional part of ocaml's design. You can "disable" it only in the sense that you stop any frees from happening.

In order to do away with Garbage collection you'd need to completely change the type system, and while you might get a result which ends up looking a lot like ocaml, it won't be ocaml any more than ocaml is sml.

This is what Rust started out as, by the way. In my opinion, it should have stayed that way. Post-Mozilla Rust has been nothing but a gigantic disappointment.

zozbot234 2 days ago [-]
Early versions of Rust "started out" with GC: the language was broadly similar to Golang, though with a lot of influence from Ocaml and functional languages more broadly. The comprehensive use of borrowck to avoid GC altogether was only implemented around 2014 or so, hence shortly before the Rust "1.0" release in 2015.
amelius 2 days ago [-]
> Most projects are not that use case. Therefore the choice to use Rust is wrong.

Do you think that projects that have a large GUI component should be written in Rust?

What if a project has both a "systems" and a GUI component to it?

2 days ago [-]
f1shy 2 days ago [-]
Yet another PoV: for some things with critical timing or so, GC might be a problem. But most of the time, it isn’t. The performance/predictability topic could also be reviewed…

I was talking with a colleague about that, he said “in C I know exactly where things are when” And I replied that under any OS with virtual memory, you have basically no clue where are things at any time, in the N levels of cache, and you cannot do accurate time predictions anyway… [1]

I’m convinced today GC is the way to go for almost all. And I was until 5 years ago or so, totally opposed to that view.

[1] https://news.ycombinator.com/item?id=42456310

pjmlp 2 days ago [-]
Even with critical timing, real time GCs exist for decades now, PTC and Aicas are two surviving companies selling software tooling for embedded markets, including their own JVM implementations, with AOT compilers, bare metal deployments and real time GC.

Many of their customers are factory processes and military deployments with weapons control, two scenarios where any kind of stall might produce deadly results.

f1shy 2 days ago [-]
Thank you so much! Now I have a more powerful argument, when at work somebody says a motor ECU cannot use GC, because of time constraints! Of course I assume there might be a catch from the cost PoV, where Mil may be ok, automotive consumer goods may be a problem (today)
dwattttt 1 days ago [-]
While I agree with the point, the word "surviving" doesn't do that point any favours.
pjmlp 22 hours ago [-]
Sadly in modern times many devs aren't willing to pay for their tools...

Yet they surely appreciate the income they earn by using them.

FridgeSeal 2 days ago [-]
It does have a GC.

It just runs at compile time. Bonus feature, it helpfully prevents a number of common bugs too.

amelius 2 days ago [-]
GC is short for automatic GC.

If you have to do it yourself, then it does not "have" a GC.

FridgeSeal 2 days ago [-]
Ssshhh you’re ruining my silly and “hasn’t gone over terribly well” joke.
bluGill 2 days ago [-]
Why is garbage collection called memory safety? Garbage collection in whatever form is only memory safe if it doesn't free memory that will still be used. (which means if you actually get all your free calls correct C is memory safe - most long lived C code bases have been beat on enough that they get this right for even the obscure paths).

Use after free is important, but in my experience not common and not too hard to track down when it happens (maybe I'm lucky? - we generally used a referenced counted GC for the cases where ownership is hard to track down in C++)

I'm more worried about other issues of memory safety that are not addressed: write into someone else's buffer - which is generally caused by write off the end of your buffer.

foota 2 days ago [-]
To answer your question, I'd say it's memory safe when it's a part of the runtime. At some point, you're relying on your runtime to be correct, so if it says it does garbage collection then you can rely on it, in the same way you rely on the allocator not to randomly trash your memory etc.,.
bluGill 19 hours ago [-]
You misunderstand. Sure that is a part of memory safe, but why is the much larger problem of running off the end of the buffer into something else not considered a larger part. In my experience the later is a worse problem (the blame for issues goes to someone else who's code is working perfectly correct and so they spend months trying to find a logic error before someone finally looks elsewhere - often the fix is just a random fix by those who are at fault and so the team will spend months more looking before closed as "doesn't happen anymore, no idea why". Memory leaks by contrast are hard to track down, but at least they leave obvious clues and so the blame doesn't go to the wrong person.
constantcrying 2 days ago [-]
>Why is garbage collection called memory safety? Garbage collection in whatever form is only memory safe if it doesn't free memory that will still be used.

Yes. A garbage collector is only safe if it works correctly. What an irrelevant observation. Nothing can guarantee that something works correctly if it doesn't work correctly.

bluGill 2 days ago [-]
Keep reading. That is all a garbage collector gives me. there are lots of other things that that are memory unsafe that garbage collectors don't give me.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 09:30:14 GMT+0000 (Coordinated Universal Time) with Vercel.