Very much looking forward to spending some time implementing this alongside the article. I really enjoyed your posts about making a Teeny Tiny compiler a while back too!
It's very nice to see a small type checker in Python, for Python! This became much easier in the last 10 years, since the MyPy team basically "upstreamed" the typed_ast library they were using into the stdlib.
I found that there are not enough good teaching materials on type checkers -- e.g. the second edition of the Dragon Book lacks a type checker, which is a glaring hole IMO - https://news.ycombinator.com/item?id=38270753
Also, teaching material tends to have a bias toward type inference and the Hindley-Milner algorithm, which are NOT used by the most commonly used languages
So I appreciate this, but one thing in this code that I find (arguably) confusing is the use of visitors. e.g. for this part, I had to go look up what this method does in Python:
# Default so every expr returns a Type.
def generic_visit(self, node):
super().generic_visit(node)
if isinstance(node, ast.expr):
return ANY
Also, the main() calls visit(), but the visitor methods ALSO call visit(), which obscures the control flow IMO. Personally, if I need to use a visitor, I like there to just be a single pass
---
In contrast, Essentials of Compilation was released 1 or 2 years ago, in Racket and in Python. And the Python version uses the same typed AST module.
> I found that there are not enough good teaching materials on type checkers -- e.g. the second edition of the Dragon Book lacks a type checker, which is a glaring hole IMO
Pierce’s Types and Programming Languages[1] is excellent. It starts with very little (if you understand basic set-theory notation, you’re probably OK), gets you to a pretty reasonable point, and just generally makes for very pleasant reading. You should probably pick something else if you want a hands-on introduction with an immediate payoff, but then you probably wouldn’t pick the Dragon Book, either.
TaPL really falls down when trying to bootstrap your way to understanding the notation. A lot of the notation and theory revolves around, essentially, implementing a concurrent virtual machine. I like the original algorithm W paper because it doesn't gloss this conceptual step: it is very much a virtual machine & you can see the authors handling the edge cases. The operational semantics in TaPL are (frankly) obtuse. Also, TaPL makes it seem like new features can be desugared to old features — and they can — but a little more prose explaining the feature's behavior directly without just tossing you into the semantic deep end would've made a much nicer text.
Everyone always says that, but I don't think it's a good intro :-) (e.g. I think the top comment in the lobste.rs thread is suffering from the type inference / functional bias, which is not necessarily due to TAPL, but it's a common thing I've noticed)
Right now I think Siek's book is better for what I want to do, though admittedly I didn't get that far into it, because my type checking project is way on the back burner
I would like to see any type checkers that people wrote after reading TAPL!
I’m writing one now as part of a hobby language project, about which I’ll do a Show HN once I have enough to share. I enjoyed Pierce but to your point I am going mostly down the functional route. Programming it in Python, with the book closed but after two readings I have what I need in my head (it clicked much better second time through).
Edit: This project (best fun I’ve had programming in a long while) is what got me sharing Eli Bendersky’s Unification post a couple of weeks back https://news.ycombinator.com/item?id=44938156
once you get used to it, visitors are a very pleasant way to write ast walking code in python. they are essentially generating your case statement for you, so instead of `case ast.Expr: handle_expr(node)` you just write a `self.visit_expr` method and have the visitor match the node type to the method name and call it.
Doing it this way maxes coupling and minimises cohesion.
Your language will have a number of phases/passes to carry out. Let's say LambdaLifting, TypeChecking and Inlining.
All the code for lambda lifting belongs in one module, all the code for type-checking in another module, etc.
If you instead use visitor pattern, you will be looking at all the code related to Variable, Function, Literal in those files respectively.
So when you're working on Function.typecheck(), it will sit in source code just under Function.lambdalift() and just above Function.inline() - things which you don't want to consider together. Meanwhile, you'll need to switch between source files to work on Variable.typecheck() and Literal.typecheck().
No it's not pleasant at all. It's boilerplate heavy, non-local and indirect. It's presumably a large part of why pattern matching is arriving in Python.
I guess that's subjective - I'm as big a fan of pattern matching as anyone, but when I was writing a type checker in python we made heavy use of visitors and it made the code pleasant to maintain.
Just curious. Isn't that how development tools generally work? Would you be surprised if it was in and for a compiled language? (This isn't a dismissal. I'm curious about the aspect of this specific case that amuses you.)
Foundational tooling not being written in a compiled language (fast is good, it could be jitted, but ideally it's a single binary) is actually a huge tax that I'm quite glad we're getting over as an industry.
Python is probably the apex of the "slow + doesn't work without a magic environment" problem
I suppose it is how this kind of tool generally works. I think it's just some subset of the feeling I get when someone writes(implements?) $LANGUAGE in $LANGUAGE(e.g. brainf*ck in brainf*ck)
You can see from how quickly the code becomes extremely busy and annoying to read that python being flexible is a blessing and a curse. Maybe curse is the wrong word, but none of this was really designed cohesively so it's usually very janky and a bit slow.
I found that there are not enough good teaching materials on type checkers -- e.g. the second edition of the Dragon Book lacks a type checker, which is a glaring hole IMO - https://news.ycombinator.com/item?id=38270753
Also, teaching material tends to have a bias toward type inference and the Hindley-Milner algorithm, which are NOT used by the most commonly used languages
So I appreciate this, but one thing in this code that I find (arguably) confusing is the use of visitors. e.g. for this part, I had to go look up what this method does in Python:
Also, the main() calls visit(), but the visitor methods ALSO call visit(), which obscures the control flow IMO. Personally, if I need to use a visitor, I like there to just be a single pass---
In contrast, Essentials of Compilation was released 1 or 2 years ago, in Racket and in Python. And the Python version uses the same typed AST module.
https://www.amazon.com/Essentials-Compilation-Incremental-Ap...
But it uses a more traditional functional style, rather than the OO visitor style:
https://github.com/IUCompilerCourse/python-student-support-c...
So one thing I did was to ask an LLM to translate this code from OO to functional style :-) But I didn't get around to testing it
(I looked at this code a week ago when it appeared on lobste.rs [1], and sent a trivial PR [2])
[1] https://lobste.rs/s/opwycf/baby_s_first_type_checker
[2] https://github.com/AZHenley/babytypechecker/pull/1
Pierce’s Types and Programming Languages[1] is excellent. It starts with very little (if you understand basic set-theory notation, you’re probably OK), gets you to a pretty reasonable point, and just generally makes for very pleasant reading. You should probably pick something else if you want a hands-on introduction with an immediate payoff, but then you probably wouldn’t pick the Dragon Book, either.
[1] https://www.cis.upenn.edu/~bcpierce/tapl/
Right now I think Siek's book is better for what I want to do, though admittedly I didn't get that far into it, because my type checking project is way on the back burner
I would like to see any type checkers that people wrote after reading TAPL!
Edit: This project (best fun I’ve had programming in a long while) is what got me sharing Eli Bendersky’s Unification post a couple of weeks back https://news.ycombinator.com/item?id=44938156
Your language will have a number of phases/passes to carry out. Let's say LambdaLifting, TypeChecking and Inlining.
All the code for lambda lifting belongs in one module, all the code for type-checking in another module, etc.
If you instead use visitor pattern, you will be looking at all the code related to Variable, Function, Literal in those files respectively.
So when you're working on Function.typecheck(), it will sit in source code just under Function.lambdalift() and just above Function.inline() - things which you don't want to consider together. Meanwhile, you'll need to switch between source files to work on Variable.typecheck() and Literal.typecheck().
Python is probably the apex of the "slow + doesn't work without a magic environment" problem
EDIT: escaped censorship