Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲I Wrote a WebAssembly VM in C (irreducible.io)

331 points by irreducible 149 days ago | 96 comments

davexunit 149 days ago [-]

This was a fun read! I wrote a Wasm interpreter in Scheme awhile back so it makes me happy to see more people writing their own. It is less difficult than you might think. I encourage others to give the spec a look and give it a try. No need to implement every instruction, just enough to have fun.

whizzter 149 days ago [-]

One tip for the author from another one, the spec-test contains various weird forms of textual wasm that isn't obvious how to compile but the wast2json converter can produce a simpler JSON desc accompanies by regular binary wasm files.

bhelx 149 days ago [-]

Same tip here. We did this with Chicory: https://github.com/dylibso/chicory

I'd follow on that, the earlier you can get this test-suite running the better for the iteration speed and correctness of your project.

It took a bit of time to make everything work, but once we did, we very quickly got to the point of running anything. The test-suite is certainly incomplete but gets you 95% there: https://github.com/WebAssembly/testsuite

_cogg 149 days ago [-]

Thank you! Not the author, but I'm also building a compiler. I've stumbled across these tests before and mostly just been irritated and confused about what to do with them.

pcmoore 149 days ago [-]

I found this article very interesting with regards direct WASM interpretation: https://arxiv.org/abs/2205.01183

I produced https://github.com/peterseymour/winter on the back of it and learnt WASM is not as simple as it should be.

doctor_radium 149 days ago [-]

Newbie questions:

How do you debug an interpreter when you aren't coding for it directly? How far does fuzzing strings of opcodes get you?

How much practical difference is there between a server side WASM engine and a browser-based one? How much work would be involved converting one to the other?

syrusakbary 149 days ago [-]

This is an interesting approach, great work!

For anyone that wants to check where the meat is at, is mostly in this file: https://github.com/irrio/semblance/blob/main/src/wrun.c

Thinking out loud, I think it would have been a great idea to conform with the Wasm-C-API (https://github.com/WebAssembly/wasm-c-api) as a standard interface for the project (which most of the Wasm runtimes: Wasmer, V8, wasmi, etc. have adopted), as the API is already in C and it would make it easier to try for developers familiar with that API.

Note for the author: if you feel familiar enough with Wasm and you would like to contribute into Wasmer, we would also welcome any patches or improvements. Keep up the work work!

benatkin 149 days ago [-]

Um, the author is clearly familiar enough with Wasm, but probably knows enough to know to avoid a company that tried to trademark WebAssembly.

> understandable concerns about the fact we, Wasmer, a VC-backed corporation, attempted to trademark the name of a non-profit organization, specifically WebAssembly

Acknowledgement of wrongdoing.

kowlo 149 days ago [-]

I hadn't heard about this - terrible.

arjvik 149 days ago [-]

Read the whole blogpost that quote was taken from:

https://wasmer.io/posts/wasmer-and-trademarks-extended

I don't think this is as much of a smoking gun as it is made out to be.

dapperdrake 149 days ago [-]

Attention is all we need.

szundi 149 days ago [-]

Just for defensive purposes obviously

syrusakbary 149 days ago [-]

"Mistakes teach us, forgiveness frees us" - ChatGPT (o3-mini)

https://wasmer.io/posts/wasmer-and-trademarks

oguz-ismail 149 days ago [-]

> Wasmer

> Installed-Size: 266 MB

What the hell

syrusakbary 149 days ago [-]

Indeed, we need to improve further the base binary size!

Most of the size comes from the LLVM backend, which is a bit heavy. Wasmer ships many backends by default, and if you were to use Wasmer headless that would be just a bit less than a Mb.

If you want, you can always customize the build with only the backends that you are interested in using.

Note: I've seen some builds of LLVM under 5-10Mb, but those require heavy customization. Is clear that we have still some work to do to reduce size on the general build!

CyberDildonics 149 days ago [-]

So you are shipping all of llvm?

dapperdrake 149 days ago [-]

Well, if you want to just-in-time compile, then it seems like a compiler is one way to go.

They are now in the size realm of Lisp and Smalltalk. Forth may lean towards the lighter side.

CyberDildonics 149 days ago [-]

Lisp and Smalltalk are 266 MB?

tcc is 100KB

https://www.bellard.org/tcc/

folmar 148 days ago [-]

Squeak Smalltalk is 53 MB on Windows (excluding sources). 48 MB of that is the image with UI framework and so on, but it's not immediately easy to say which library/ui parts are core.

lifthrasiir 149 days ago [-]

TCC doesn't do much optimization in turn, while commercial implementations of Lisp and Smalltalk are surely much larger than that.

CyberDildonics 148 days ago [-]

I'm not sure what your point is. The person I replied to was justifying a 266MB binary by saying a compiler was included. Are you saying optimization would make tcc 2,660 times as big?

lifthrasiir 148 days ago [-]

The point is that LLVM does much, much more than TCC and, while it is definitely possible that LLVM could have done that in a smaller binary, TCC is probably not a good thing to compare with because it only does a bare minimum. I should also note that LLVM is a cross-compiler by default...

CyberDildonics 148 days ago [-]

The person I replied to just said it is 266MB because it includes a compiler, and that obviously isn't true.

https://github.com/bytecodealliance/wasm-micro-runtime

This says 4000 lines

https://github.com/explodingcamera/tinywasm

What are we talking about here? There is obviously no reason a wasm jit has to be 266 MB

149 days ago [-]

tondrej 148 days ago [-]

afaik it's far from adopted by Wasmer. https://github.com/wasmerio/wasmer/issues/2615 - stale then autoclosed

syrusakbary 148 days ago [-]

It seems the issue was not up to date, thanks for pointing it out (just commented in Github to make it clear for future readers).

Wasmer supports most of the Wasm-C-API, with some exceptions for APIs that are not that common to use: finalize, hostref and threads (tables just had some quirks on the implementation that we had to polish, but is generally implemented [1]).

https://github.com/wasmerio/wasmer/blob/main/lib/c-api/tests...

If you are interested in running any of these cases using Wasmer via the Wasm-C-API please let us know... it should be mostly trivial to add support!

[1] https://github.com/wasmerio/wasmer/blob/main/lib/c-api/src/w...

tondrej 148 days ago [-]

Thank you! I'll take a look...

dapperdrake 149 days ago [-]

Here is a more controversial point: Are you interested in adding a preliminary tail-call instruction?

The WASM spec people rejected it for being too "high-level". But the C committee also rejected proposals from Dennis Ritchie. My money is still on Ritchie. Rob Pike's money seems to be on Ritchie direction as well. Otherwise, why create Golang?

Tail-calls are only high-level if calls are high-level.

tlively 149 days ago [-]

The WebAssembly tail call proposal has been accepted, finished, and implemented for over two years now. https://github.com/WebAssembly/meetings/blob/main/main/2023/...

davexunit 148 days ago [-]

And WebKit finally shipped it recently! Love return_call and friends.

abnercoimbre 149 days ago [-]

Take a look at Orca [0] since I think you'd be a great contributor there.

[0] https://orca-app.dev

UncleEntity 149 days ago [-]

Heh, I also made the decision to focus on one project instead of hopping between new and shiny with the exception that I'm getting our AI overloads to do all the yak shaving -- which is frustrating to say the least...

--edit--

Oh, and I also was going to suggest using a library like libffi to make calls into C so you can do multiple arguments and whatnot.

deivid 149 days ago [-]

This is a really nice write up! It's giving me motivation to go back to my WASM implementation

Jyaif 149 days ago [-]

Regarding using WebAssembly as a plugin API, like for zed:

how do plugin developers debug their code? Is there a way for them to do breakpoint debugging for example? What happens if their code crash, do they get a stacktrace?

hemant1041 148 days ago [-]

This was a great read! Really cool to see someone dive deep into WebAssembly by building an interpreter from scratch.

hemant1041 141 days ago [-]

Test

hemant1041 133 days ago [-]

Another test comment

hemant1041 147 days ago [-]

Asd

hemant1041 147 days ago [-]

Test 10

hemant1041 147 days ago [-]

Test 8

hemant1041 147 days ago [-]

Test 7

hemant1041 147 days ago [-]

Test 6

hemant1041 147 days ago [-]

Test 5

hemant1041 147 days ago [-]

Test 4

hemant1041 147 days ago [-]

Test 3

hemant1041 147 days ago [-]

Test

hemant1041 147 days ago [-]

Test 2

147 days ago [-]

hemant1041 147 days ago [-]

test 9

147 days ago [-]

141 days ago [-]

hemant1041 141 days ago [-]

Testing

hemant1041 133 days ago [-]

hemant1041 141 days ago [-]

Test

greasy 149 days ago [-]

This is awesome.

pdubroy 149 days ago [-]

This is great! The WebAssembly Core Specification is actually quite readable, although some of the language can be a bit intimidating if you're not used to reading programming language papers.

If anyone is looking for a slightly more accessible way to learn WebAssembly, you might enjoy WebAssembly from the Ground Up: https://wasmgroundup.com

(Disclaimer: I'm one of the authors)

amw-zero 149 days ago [-]

I think it's much better to just learn how to read inference rules. They're actually quite simple, and are used ubiquitously to define PL semantics definitions.

Constraining this on "that's not an option" is a big waste of time - learning this will open up all of the literature written on the subject.

shpongled 149 days ago [-]

The WASM spec is so well defined presumably because Andreas Rossberg is the editor - and he did a bunch of PL research on extensions to Standard ML, which is famous for it's specification!

amw-zero 148 days ago [-]

I totally agree. Standard ML set the stage for everyone to finally formally specify a full language, and only WebAssembly has carried the torch as far as I know (other than small research languages).

MuffinFlavored 149 days ago [-]

I know one of WebAssembly's biggest features by design is security / "sandbox".

But I've always gotten confused with... it is secure because by default it can't do much.

I don't quite understand how to view WebAssembly. You write in one language, it compiles things like basic math (nothing with network or filesystem) to another and it runs in an interpreter.

I feel like I have a severe lack/misunderstanding. There's a ton of hype for years, lots of investment... but it isn't like any case where you want to add Lua to an app you can add WebAssembly/vice versa?

jeroenhd 149 days ago [-]

WebAssembly can communicate through buffers. WebAssembly can also import foreign functions (Javascript functions in the browser).

You can get output by reading the buffer at the end of execution/when receiving callbacks. So, for instance, you pass a few frames worth of buffers to WASM, WASM renders pixels into the buffers, calls a callback, and the Javascript reads data from the buffer (sending it to a <canvas> or similar).

The benefit of WASM is that it can't be very malicious by itself. It requires the runtime to provide it with exported functions and callbacks to do any file I/O, network I/O, or spawning new tasks. Lua and similar tools can go deep into the runtime they exist in, altering system state and messing with system memory if they want to, while WASM can only interact with the specific API surface you provide it.

That makes WASM less powerful, but more predictable, and in my opinion better for building integrations with as there is no risk of internal APIs being accessed (that you will be blamed for if they break in an update).

brabel 149 days ago [-]

> Lua and similar tools can go deep into the runtime they exist in, altering system state and messing with system memory if they want to

That's not correct, when you embed Lua you can choose which APIs are available, to make the full stdlib available you must explicitly call `luaL_openlibs` [1].

[1] https://www.lua.org/manual/5.3/manual.html#luaL_openlibs

westurner 149 days ago [-]

WASI Preview 1 and WASI Preview 2 can do file and network I/O IIUC.

Re: tty support in container2wasm and fixed 80x25 due to lack of SIGWINCH support in WASI Preview 1: https://github.com/ktock/container2wasm/issues/146

The File System Access API requires granting each app access to each folder.

jupyterlab-filesystem-access only works with Chromium based browsers, because FF doesn't support the File System Access API: https://github.com/jupyterlab-contrib/jupyterlab-filesystem-...

The File System Access API is useful for opening a local .ipynb and .csv with JupyterLite, which builds CPython for WASM as Pyodide.

There is a "Direct Sockets API in Chrome 131" but not in FF; so WebRTC and WebSocket relaying is unnecessary for WASM apps like WebVM: https://news.ycombinator.com/item?id=42029188

westurner 149 days ago [-]

WASI Preview 2: https://github.com/WebAssembly/WASI/blob/main/wasip2/README.... :

> wasi-io, wasi-clocks, wasi-random, wasi-filesystem, wasi-sockets, wasi-cli, wasi-http

panic 149 days ago [-]

I don’t believe it is currently possible for a WebAssembly instance to access any buffer other than its own memory. You have to copy data in and out.

deathanatos 149 days ago [-]

The embedder could hand the module functions for manipulating external buffers via externrefs. (I'm not sure if that's a good idea, or not, just that it could.)

But if the module wants to compute on the values in the buffer, at some level it would have to copy the data in/out.

davexunit 149 days ago [-]

Use the GC instructions and you can freely share heap references amongst other modules and the host.

panic 149 days ago [-]

How do you access the contents of a heap reference from JavaScript in order to “send it to a <canvas> or similar”?

davexunit 149 days ago [-]

Assuming you're talking about reading binary data like (array i8), the GC MVP doesn't have a great answer right now. Have to call back into wasm to read the bytes. Something for the group to address in future proposals. Sharing between wasm modules is better right now.

pdubroy 149 days ago [-]

You should check out the book :-)

We have a chapter called "What Makes WebAssembly Safe?" which covers the details. You can get a sneak peek here: https://bsky.app/profile/wasmgroundup.com/post/3lh2e4eiwnm2p

pizlonator 149 days ago [-]

> But I've always gotten confused with... it is secure because by default it can't do much.

Yes. That’s a super accurate description. You’re not confused.

> I don't quite understand how to view WebAssembly. You write in one language, it compiles things like basic math (nothing with network or filesystem) to another and it runs in an interpreter.

Almost. Wasm is cheap to JIT compile and the resulting code is usually super efficient. Sometimes parity with native execution.

> I feel like I have a severe lack/misunderstanding. There's a ton of hype for years, lots of investment... but it isn't like any case where you want to add Lua to an app you can add WebAssembly/vice versa?

It’s definitely a case where the investment:utility ratio is high. ;-)

Here’s the trade off between embedding Lua and embedding Wasm:

- Both have the problem that they are only as secure as the API you expose to the guest program. If you expose `rm -rf /` to either Lua or Wasm, you’ll have a bad time. And it’s surprisingly difficult to convince yourself that you didn’t accidentally do that. Security is hard.

- Wasm is faster than Lua.

- Lua is a language for humans, no need for another language and compiler. That makes Lua a more natural choice for embedded scripting.

- Lua is object oriented, garbage collected, and has a very principled story for how that gets exposed to the host in a safe way. Wasm source languages are usually not GC’d. That means that if you want to expose object oriented API to the guest program, then it’ll feel more natural to do that with Lua.

- The wasm security model is dead simple and doesn’t (necessarily) rely on anything like GC, making it easier to convince yourself that the wasm implementation is free of security vulnerabilities. If you want a sandboxed execution environment then Wasm is better for that reason.

Karellen 149 days ago [-]

> You write in one language

Not quite. Web assembly isn't a source language, it's a compiler target. So you should be able to write in C, Rust, Fortran, or Lua and compile any of those to WebAssembly.

Except that WebAssembly is a cross-platform assembly language/machine code which is very similar to the native machine code of many/most contemporary CPUs. This means a WebAssembly interpreter can be very straightforward, and could often translate one WebAssembly instruction to one native CPU instruction. Or rather, it can compile a stream of WebAssembly instructions almost one-to-one to native CPU instructions, which it can then execute directly.

whizzter 149 days ago [-]

A JIT should be able to translate most arithmetic and binary instructions to single-opcodes, however anything involving memory and functions calls needs safety checks that becomes multi-instruction. branches could mostly be direct _unless_ the runtime has any kind of metering (it should) to stop eternal loops (if it also wants to be crash-safe even if it's exploit safe).

kouteiheika 149 days ago [-]

> anything involving memory [..] needs safety checks that becomes multi-instruction

Not necessarily; on AMD64 you can do memory accesses in a single instruction relatively easily by using the CPU's paging machinery for safety checks plus some clever use of address space.

> branches could mostly be direct _unless_ the runtime has any kind of metering (it should) to stop eternal loops

Even with metering the branches would be direct, you'd just insert the metering code at the start of each basic block (so that's two extra instructions at the start of each basic block). Or did you mean something else?

whizzter 147 days ago [-]

Can't remember exactly what one but I remember reading an article about some VM that added interruption checks not at block boundaries, but rather only at _backwards_ branches and call-sites, so "safe" forward jumps (if/else/break) wouldn't cost anything extra but anything that could go on "forever" had the checks.

Reserving 4gb address space oughta work on any 64bit machine with a decent OS/paging system though? I was looking into it but couldn't use it in my case however since it needs to cooperate with another VM that already hooks the PF handler (although maybe I should take another stab if there is a way to put in a hierarhcy).

charleslmunger 149 days ago [-]

Metering also doesn't require a branch if you implement it with page faults. See "Implicit suspend checks" in https://android-developers.googleblog.com/2023/11/the-secret...

kouteiheika 149 days ago [-]

Yep. That's a nice trick; unfortunately it's non-deterministic.

csjh 147 days ago [-]

How's it nondeterministic?

dist1ll 148 days ago [-]

How do you avoid branches in 64-bit WASM?

kouteiheika 148 days ago [-]

You can run the guest in another thread/process and give it its own dedicated address space, or use something like memory protection keys.

beardyw 149 days ago [-]

Yes, interpretation on the fly was never its intention. The intention was to provide interpreted languages with a way to implement fast compiled functions.

Gupta2 149 days ago [-]

Speaking of WebAssembly security, is it vulnerable to Spectre/CPU style attacks like those in JavaScript? (WASM without imported JS functions)

jeffparsons 149 days ago [-]

Yes, if you give the Wasm instance access to timers.

saagarjha 149 days ago [-]

Yes.

coliveira 149 days ago [-]

I think the biggest advantage of wasm in terms of security is that it doesn't accept machine language written in the target machine, only in this artificial machine language. This means that it cannot encode arbitrary code that could be executed by the host machine. Everything it runs has necessarily to go through the wasm interpreter.

wyldfire 149 days ago [-]

> This means that it cannot encode arbitrary code that could be executed by the host machine.

But the host machine still can, so it's not as big of advantage in that regard. If you could somehow deliver a payload of native code and jump to it, it'd work just fine. But the security you get is the fact that it's really hard to do that because there's no wasm instructions to jump to arbitrary memory locations (even if all the host ISAs do have those). Having a VM alone doesn't provide security against attacks.

It's often the case that VMs are used with memory-safe languages and those languages' runtime bounds checks and other features are what gives them safety moreso than their VM. In fact, most bytecode languages provide a JIT (including some wasm deployments) so you're actually running native code regardless.

hb-robo 149 days ago [-]

That's quite interesting. This is way outside of my wheelhouse - has this kind of approach been tried in other security contexts before? What would you even call that, virtualization?

dmitrygr 149 days ago [-]

The word is "bytecode" and the idea is as old as computing.

tubs 149 days ago [-]

Java.

jedisct1 145 days ago [-]

It's like JavaScript, Python or PHP. There are no pointers, only arrays, that cause the app to crash if it attempts to read out of the bounds.

crabmusket 149 days ago [-]

I bought the early access of your book a while ago and completed the first few chapters which were available. I found it a fantastic resource, and I was keen to continue but other responsibilities have gotten in the way since. I recommend it!

pdubroy 149 days ago [-]

Glad to hear!

If you haven't looked at it in a while — we just published a draft of the final technical chapter, and are planning an official launch on March 4. So, might be a good time to dig back in :-)

veltas 149 days ago [-]

> actually quite readable, although some of the language can be a bit intimidating if you're not used to reading programming language papers

You're more generous than me, I think it's rubbish.

Would have been easier to read if they had written it more like an ISA manual.

mananaysiempre 149 days ago [-]

You can understand the WASM spec in your sleep if you’ve ever worked through a type-system paper from the last two decades (or a logic paper from even earlier I guess).

Granted, not many people have, but there’s a reason why it makes sense for it to be written in that style: they want it to be very clear that the verification (typechecking, really) algorithm doesn’t have any holes, and for that it’s reasonable to speak the language of the people who prove that type of thing for a living.

The WASM spec is also the ultimate authoritative reference for both programmers and implementers. That’s different from the goals of an ISA manual, which usually only targets programmers and just says “don’t do that” for certain dark corners of the (sole) implementation. (The RISC-V manual is atypical in this respect; still, I challenge you to describe e.g. which PC value the handler will see if the user code traps on a base RV32IMA system.)

veltas 148 days ago [-]

> You can understand the WASM spec in your sleep if you’ve ever worked through a type-system paper from the last two decades

Is there a lot of crossover between those people and people who work with assemblers or code generation? There's even more crossover with those people and understanding how to read a minimal ISA document.

amw-zero 149 days ago [-]

This is an opportunity to learn. The way WebAssembly is defined is the standard way PL semantics are defined.

veltas 148 days ago [-]

WebAssembly isn't a programming language, it's an ISA or a bytecode VM. They've defined it in a very over-engineered way, with different representations all the way through. It has created issues for people working on assemblers and code generation because it's hard to choose a good representation for compilers to generate. This all could have been avoided.

amw-zero 148 days ago [-]

Those are all programming languages.

149 days ago [-]

OhNoNotAgain_99 148 days ago [-]

[dead]

autumnlani 149 days ago [-]

This is awesome. Nicely done

Rendered at 22:31:08 GMT+0000 (Coordinated Universal Time) with Vercel.