A Basic Just-In-Time Compiler (2015)

(nullprogram.com)

85 points | by ibobev 16 hours ago

6 comments

  • kscarlet 14 hours ago
    It seems that JIT is overloaded with at least 2 meaning.

    The canonical definition of JIT is "compilation during execution of a program". Usually, a program is being interpreted first, then switches to compiled code in the middle of execution. This is not what this article does.

    What this article does is sometimes called on-the-fly AOT, or just on-the-fly compilation. I'd prefer not overloading the term "JIT".

    • dataflow 11 hours ago
      > Usually, a program is being interpreted first, then switches to compiled code in the middle of execution.

      I agree what they have isn't JIT compilation, but not for that reason. Tiered execution was never a central part of JIT compilation either. It was a fairly new invention in comparison.

      The reason what they describe isn't JIT compilation is IMO fairly boring: it's not compiling the input program in any meaningful way, but simply writing hard-coded logic into executable memory that it already knows the program intended to perform. Sure there's a small degree of freedom based on the particular arithmetic operations being mentioned, but that's... very little. When your compiler already knows the high-level source code logic before it's even read the source code, it's... not a compiler. It's just a dynamic code emitter.

      As to the actual difference between JIT vs. AOT... it may just come down to accounting. That is, on whether you can always exclude the compilation time/cost from the overall program execution time/cost or not. If so, you're compiling ahead of (execution) time. If not, you're compiling during execution time.

      • kscarlet 40 minutes ago
        > As to the actual difference between JIT vs. AOT... it may just come down to accounting. That is, on whether you can always exclude the compilation time/cost from the overall program execution time/cost or not. If so, you're compiling ahead of (execution) time. If not, you're compiling during execution time.

        Well, this includes what I refer to as "on-the-fly" AOT, like SBCL, CCL, Chez Scheme... Even ECL can be configured to work this way. As I mentioned in another comment, people in those circles do not refer to these as "JIT" at all, instead saying "I wish my implementation was JIT instead of on-the-fly AOT"!

      • ordu 9 hours ago
        > it's not compiling the input program in any meaningful way, but simply writing hard-coded logic into executable memory that it already knows the program intended to perform.

        The program reads the logic from stdin and translates it into machine instructions. I can agree that there is not a lot of a freedom in what can be done, but I think it just means that the source language is not Turing complete. I don't believe that compiler needs to deal with a Turing complete language to claim the title "JIT compiler".

        • dataflow 9 hours ago
          > The program reads the logic from stdin and translates it into machine instructions. I can agree that there is not a lot of a freedom in what can be done, but I think it just means that the source language is not Turing complete. I don't believe that compiler needs to deal with a Turing complete language to claim the title "JIT compiler".

          "Not Turing-complete" is quite the understatement.

          A "compiler" is software that translates computer code from one programming language into another language. Not just any software that reads input and produces output.

          The input language here is... not even a programming language to begin with. Literally all it can express is linear functions. My fixed-function calculator is more powerful than that! If this is a programming language then I guess everyone who ever typed on a calculator is a programmer too.

    • orthoxerox 6 hours ago
      I think if your program starts to execute machine code that wasn't present in the binary, then it counts as having a JIT compiler.

      Of course, there are edge cases like embedding libtcc, but I think it's a reasonable definition.

    • MobiusHorizons 13 hours ago
      I always took the distinction to be the executable memory bit as opposed to writing an executable and then launching it in the usual way. Of course high performance runtimes that contain JITs do typically interpret first in order to get type information and reduce startup latency, but that’s more a function of the language not being efficiently AOT compileable rather than fundamental to the concept of a JIT
      • throwaway17_17 12 hours ago
        Can you explain what mean by a “language not being efficiently AOT compileable”? I am guessing you are referring to languages that emphasize REPL based development or languages that are ‘image’ based (I think Smalltalk is the most common example).

        Like the guesses above, I can understand difficulty with AOT compilation in conjunction with certain use cases; however, I can not think of a language that based on its definition would be less amenable to AOT compilation.

        • Nevermark 10 hours ago
          The more context is narrowed down, the more optimizations that can applied during compilation.

          AOT situations where a lot of context is missing:

          • Loosly typed languages. Code can be very general. Much more general than how it is actually used in any given situation, but without knowing what the full situation is, all that generality must be complied.

          • Increment AOT compilation. If modules have been compiled separately, useful context wasn't available during optimization.

          • Code whose structure is very sensitive to data statistics or other conditional runtime information. This is the prime advantage of JIT over AOT. Unless the AOT compiler is working in conjunction with representative data and a profiler.

          Those are all cases where JIT has advantages.

          A language where JIT is optimal, is by definition, less amenable to AOT compilation.

        • MobiusHorizons 9 hours ago
          What I meant was dynamic typing primarily. Think JavaScript or lua. These are both languages where the choice is between interpreter and JIT rather than JIT or AOT. VM based languages like Java also fall in that category, not because of dynamic typing so much as because of shipping bytecode to the client for portability reasons.
      • kscarlet 13 hours ago
        In that sense almost every compiled Lisp/Scheme implementation, GHC, etc. or any other interactive programming system, count as JIT. But virtually nobody in those circles refer to such implementations as JIT. Instead, people says "I wish our implementation was JIT to benefit from all those optimizations it enables"!
        • MobiusHorizons 13 hours ago
          Do they generate machine code in ram and jump to it? Or do they interpret byte code?

          EDIT: at least GHC seems to be a traditional AOT compiler.

          • kscarlet 12 hours ago
            They generate native machine code.
  • mananaysiempre 1 hour ago
    > On x86-64, pages may be 4kB, 2MB, or 1GB

    But I believe sysconf(_SC_PAGESIZE) will always be 4KB, because the “may” is at the user’s discretion, not the system’s. Except on Cosmopolitan where it will always be 64KB, because Windows NT for Alpha (yes, seriously).

  • possiblywrong 14 hours ago
    Following is the closed form solution linked in the article (from a since-deleted Reddit comment):

        from functools import reduce
    
        def recurrence(ops, a0, n):
            def transform(x, op):
                return eval(repr(x) + op)
            ops = ops.split()
            m = reduce(transform, [op for op in ops if op[0] in ('*', '/')], 1)
            b = reduce(transform, ops, 0)
            for k in range(n + 1):
                print('Term 0: ', a0 * m ** k + b * (m ** k - 1) / (m - 1))
    
    > This is really only interesting if a particular (potentially really large) term of the sequence is desired, as opposed to all terms up to a point. The key observation is that any sequence of the given set of operations reduces to a linear recurrence which has the given solution.
  • senko 5 hours ago
    Ah, I thought this was going to be about a JITted BASIC. No such luck.
  • the-smug-one 8 hours ago
    Man, I friggin despise how hard (in a very not fun way) generating x86 assembly is :-). Generating Aarch64 is a lot easier.
    • mananaysiempre 1 hour ago
      I mean, I can’t do it by heart, but it’s not fundamentally very hard? Thumb-2 is definitely much more annoying.
  • MobiusHorizons 13 hours ago
    Is it just me or is the author using static wrong? Someone mentioned it in the thread on the previous post, where it felt more like an oversight. But in this article it seems much more like an actual misunderstanding. Should it actually be the function level static variable? I feel like in the right context the optimizer might be able to leave that in a register if there the calling function is in the compilation unit.