NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Working through 'Writing A C Compiler' (jollygoodsw.wordpress.com)
sanufar 15 days ago [-]
I love this book! I worked through a bunch of it during my winter break last year and found the incremental teaching style extremely rewarding. For readers of the book, Sandler’s reference OCaml implementation is super useful for getting your bearings. I was kind of thrown off by the use of TACKY as an IR, but it was nice to have a solid reference as I worked through the book. For those more experienced with compilers: what are some good resources for stuff like SSA and optimisation? I’ve looked at some of the resources here https://bernsteinbear.com/pl-resources/ but are there other canonical resources?
starkparker 15 days ago [-]
> Sandler’s reference OCaml implementation is super useful for getting your bearings

The author also maintains a list of implementations created from the book: https://github.com/nlsandler/c-compiler-implementations

rootnod3 14 days ago [-]
sanufar 14 days ago [-]
This looks great! Added that and this https://www.amazon.com/Engineering-Compiler-Keith-D-Cooper/d... to my list
sim7c00 14 days ago [-]
the dragonbook is a good start to go deeper inthink (modern editions). its less practical, more theory heavy. does have all the algos and some example materials. if you have a bit of basis it will be useful reference.
stellalo 15 days ago [-]
From what the blog author says (I haven’t looked into the book), the approach reminds me of

> Abdulaziz Ghuloum, 2006, An Incremental Approach to Compiler Construction http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf

stellalo 15 days ago [-]
Oh that’s exactly what the book’s author blog mentions: https://norasandler.com/2017/11/29/Write-a-Compiler.html
sn9 15 days ago [-]
You can also find this approach in this book that comes in Racket and Python flavors [0].

[0] https://mitpress.mit.edu/9780262047760/essentials-of-compila...

UncleOxidant 15 days ago [-]
The author of the book also has a series of blog entries: https://norasandler.com/2017/11/29/Write-a-Compiler.html
pshirshov 14 days ago [-]
My personal experience of writing various DSL/general purpose compilers (I've created at least 4 DSL compilers and one general purpose) is kinda different from the books I've read.

Scala is an awesome language which frees one from working on many boring details and makes it possible to keep the codebase tiny. With such an expressive language I can concentrate on the logic instead of thinking about minor things.

We have enough memory and cpu power to use worse than linear algorithms without noticeable performance impact.

Parsers aren't an issue at all in our days, peg combinators like fastparse allow one to be extremely productive.

I tend to stick to immutable multi-staged pipelile with several immutable trees, use error-accumulating data structures (Either[NonEmptyList[Issue], T]), explicitly express entity (eg type definition) dependencies as graphs (which can be processed iteratively and in parallel).

mkw5053 15 days ago [-]
Sounds like a great book. I worked through nand2tetris ages ago and remember enjoying it as well.
kragen 15 days ago [-]
This makes the book sound very well structured! I also found Ghuloum's paper inspirational.
15 days ago [-]
15 days ago [-]
afaeta 15 days ago [-]
[dead]
jokoon 15 days ago [-]
The crafting interpreting asks the reader to use the visitor pattern, and this was quite a turn off for me, I stopped there.
alabhyajindal 15 days ago [-]
You can choose to implement it differently based on your implementation language. Data Classes and If statements work really well for this in Python, for example.

Statement Data Classes: https://github.com/alabhyajindal/plox/blob/main/stmt.py

If statements in the parser matching against them: https://github.com/alabhyajindal/plox/blob/main/parser.py#L3...

UncleEntity 15 days ago [-]
> The crafting interpreting asks the reader to use the visitor pattern...

...or just a big old, plain jane switch statement.

In my current project I modified my ASDL generator to output a C instead of C++ AST and the visitor pattern carried over until realizing a switch statement is just as good (or better) in C so I ripped out that part of the template file. The choice was to write a dispatch function which called the various methods based on the AST node type or have a generated struct full of function pointers with a generated dispatch function which calls the various methods based on the AST node type. Same difference, really, just one has an added level of indirection.

The amazing part is I didn't rewrite the ASDL generator for the fifth time and just decided it's 'good enough' for what I need it for. Aside from one small C++ism, which is easily worked around and turns out wasn't even needed in the C++ template, the thing is 100% language and 'access pattern' agnostic in generating the output code.

There was probably a point I was trying to make when I started typing, dunno?

grg0 15 days ago [-]
My takeaway from your verbose description is:

- You don't need a visitor pattern if you have predetermined the data you are going to work with and all the operations on it (i.e., the open/closed principle does not apply.)

- For the same reason, you don't need dynamic dispatch, which is often how the visitor (and other) pattern(s) are implemented.

- The code is much simpler to understand (and debug) because it's all there in once place. It's also faster than the dynamic dispatch version because it's all known at compile-time.

- Personally: OOP is stupid, confusing, and inefficient; I think universities should only teach it as an optional course. These patterns are 50% lack of functional programming features and 50% sheer stupidity. Universities should go back to teaching real abstraction with Scheme and SICP, a MIPS-style assembly language, and stop confusing students.

markus_zhang 15 days ago [-]
I think I did something similar for an emulator. Instead of using a big switch I simply used a big array of function pointers. So if it is a BLAH opcode, the execution code simply call fp_list[BLAH](op). But I guess it is a bit too much for CPUs that have tons of operations.
UncleEntity 14 days ago [-]
I actually use that pattern in the VM just with a dispatch function instead of calling the function out of the array directly. The compiler (more than likely) inlines the dispatch call and it let's me add some error checking on the opcode without it being scattered all over the instruction functions.

The Next Big Trick™ is to just embed the function pointer into the opcode itself and do away with the dispatching completely, getting rid of a single pointer dereference per opcode has to be worth at least a 0.01% speed gain, right? I'm kidding, of course, as the original copy and patch (using C labels as references to mark the code boundaries of the code templates) should allow actual measurable gains in the single digit range.

billforsternz 14 days ago [-]
I recall back in the day when I was an embedded systems programmer at the coal face I had similar antipathy towards a pattern. I think it was called the State pattern or similar. It involved a class hierarchy and virtual functions, one for basically every state, event pair.

I preferred my simple C design whic used a lookup table to quickly access a structure for each state, event pair instead. The structure would provide an output state, a message to send, an action to emit, a bitmask to select zero or more other standard operations, and yes a function to run in those rare cases where something really special was needed. All fields optional.

The point is I didn't need to write N x M functions, I just needed to edit a table. And I didn't need to understand any rocket science.

markus_zhang 15 days ago [-]
This part confused me quite a bit so I turned it into the more verbose format by copy-pasting. I don’t like the boilerplate code generation either so I converted that part too. The whole book is still pretty interesting though.
almostgotcaught 15 days ago [-]
Lolol weirdest reason to reject that book - 90% of production parsers are recursive descent parsers.
markus_zhang 15 days ago [-]
It probably has nothing to do with recursive descent parsing, which is intuitive, but with the visitor pattern as mentioned. I myself find it very distracting too.
almostgotcaught 15 days ago [-]
.... They're the same thing....
ossopite 15 days ago [-]
What?

The visitor pattern is a technique for dynamic dispatch on two values (typically one represents 'which variant of data are we working with' and the other 'which operation are we performing'). You would not generally use that in recursive descent parsing, because when parsing you don't have an AST yet, so 'which variant of data' doesn't make sense, you are just consuming tokens from a stream.

almostgotcaught 15 days ago [-]
> you are just consuming tokens from a stream.

My guy... Do you think that parsers just like... concat tokens into tuples or something....??? Do you not understand that after lexing you have tokens (which are a "type") and AST node construction (an "operation") and that the grammar of a language is naturally a graph.... Like where else would you get the "recursion" from....

If that doesn't make sense I invite you to read some literature:

> makeAST():

> asks the tokenizer for the next token t, and then asks t to call the appropriate factory method the int token and the id token call makeLeaf(), the left parenthesis token calls makeBinOp() all other tokens should flag an error! does the above "smell" like the visitor pattern to you or not? Who are the hosts and who are the visitors?

https://www.clear.rice.edu/comp212/02-fall/labs/11/

markus_zhang 15 days ago [-]
OK I might be wrong about the visitor pattern, but what I really did not like is to use the accept() and visitBlah() way to execute AST nodes: https://craftinginterpreters.com/representing-code.html#the-...

I did continue reading the book (not the original author of that reply) but I do think it is distracting for newbies. I had to come back to this page over and over again to recollect memory about the pattern, because I usually read it one chapter or a few sections every week, so every time I had to remind myself how this visitBlah() and accept() pair works. I really think a big switch() (or anything that works but is simpler) would be a lot easier to understand.

The other reason I dislike this kind of stuffs is that I have someone in the team who really likes to use patterns for every piece of code. It's kinda difficult to tell whether it is over-engineering or not, but my principle is that intuition always beats less lines of code (or DRY), unless it is absurdly more lines of code or repetition. And to test that principle you just grab a newbie and see which one makes more sense to him.

almostgotcaught 15 days ago [-]
> I really think a big switch() (or anything that works but is simpler) would be a lot easier to understand.

It's much easier conceptually to implement this using recursion instead of a while loop and a token stack (it's basically DFS). So I disagree with you there.

> The other reason I dislike this kind of stuffs is that I have someone in the team who really likes to use patterns for every piece of code. It's kinda difficult to tell whether it is over-engineering or not, but my principle is that intuition always beats less lines of code (or DRY), unless it is absurdly more lines of code or repetition. And to test that principle you just grab a newbie and see which one makes more sense to him

I'm with you - I really don't give a shit about patterns (which was my whole original point - who cares). But that last part I don't agree with - systems code (like a parser) doesn't need to be legible to a noob. Of course we're talking about a textbook so your probably right but like I said most production parsers and AST traversals are written exactly this same way. So anyone learning this stuff hoping to get a job doing it should just get used to it.

jpc0 14 days ago [-]
> so every time I had to remind myself how this visitBlah() and accept() pair works. I really think a big switch()…

This is just and alternative implementation of the visitor pattern. Whether you implement it using dynamic dispatch or a switch or an if stack its all the same pattern…

mrkeen 15 days ago [-]
Nope, you had it right.

Visitor thoroughly confuses me in the context of parsing (maybe in all contexts.)

visit and accept are not the verbs I want to be seeing in the code. I want to see then, or, and try.

almostgotcaught 15 days ago [-]
Parsers "accept" or "reject" programs. It's completely standard language.
mrkeen 14 days ago [-]
Alright, so where can I read more about the visitor pattern's "reject" method?
almostgotcaught 14 days ago [-]
> all other tokens should flag an error

Ie the link I posted above

ossopite 15 days ago [-]
I see that you've found an example of how recursive descent parsing actually can be implemented with the visitor pattern, which I've never come across before, and I didn't read it carefully enough to understand the motivation - but that doesn't mean they are the same thing - the recursive descent parsers I've seen before just inspect which tokens are seen and directly construct AST nodes

as an adendum, the reason I don't understand the motivation is that the visitor pattern in the way I described it is useful when you have many different operations to perform on your AST. If you have only one operation on tokens - parsing into an AST - I'm not sure why you need dynamic dispatch on a second thing, the first thing being the token type. Maybe the construction is that different operations correspond to different 'grammar rules'?

almostgotcaught 15 days ago [-]
> why you need dynamic dispatch on a second thing

You're overindexing on maximally generic visitor pattern. If you have one type of visitor but nonetheless dispatch based on type that's still visitor pattern.

EDIT: to be honest who even cares. My initial point was why in the hell would you stop reading a book because a particular "pattern" offends you. And I'll reassert it here: who cares whether a recursive descent parser fits the exact definition of visitor pattern or not - you have members of a class that do stuff (construct AST nodes) and possibly track other data and then call other members. I usually call that a visitor class even if it's the only one that ever exists <shrug>

ossopite 15 days ago [-]
Ok, that's true, but my claim is that recursive descent parsing does not have to use the visitor pattern and indeed using recursive descent parsing is not the same as using the visitor pattern (you can do the former without the latter and I claim that you usually do)
almostgotcaught 15 days ago [-]
> just inspect which tokens are seen and directly construct AST nodes

I'll repeat myself: this is not possible because you need to recursively construct the nodes (how else would you get a tree...).

ossopite 15 days ago [-]
I think I'm missing something here. if you have a grammar rule R with children A and B, and a function in your recursive descent parser that corresponds to R, why can R not call the parser functions for A and B, which return AST nodes themselves, and then construct another AST node using the result of those? Where was the visitor pattern required here?
mrkeen 15 days ago [-]
Me too. No-one's denying that recursion is happening. We're just not sure about it being synonymous with the Visitor Pattern.
jokoon 14 days ago [-]
I think the book claims to be accessible and easy

The visitor pattern is not something I find simple and easy to approach

quibono 15 days ago [-]
Couldn't you write the interpreter without it?
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 13:11:24 GMT+0000 (Coordinated Universal Time) with Vercel.