This paper: https://cs.brown.edu/~sk/Publications/Papers/Published/pmmwplck-python-full-monty/paper.pdf

… has been out for several years now, and the CPython authors don’t seem to be taking any heed from it. The question one’s faced when viewing the inner-workings of CPython’s VM is:

Is Python a lazy language, or is it not? Should types and symbols be resolved through VM, or semantic analysis? Should there be explicit tree-building and DAG number-value optimization, or just shit out the bytecode?

Because the VM seems to build classes on-the-go [list of opcodes]. I am not pretending, and I don’t pretend, that I know enough about this, but would it be not better if they did a full semantic analysis, then emitted the bytecodes? So this way, the execution would be faster, albeit whilst introducing small lags for a more loaded semantic analysis?

Of course, the answer is clear: Python may not officially be a lazy language, but it virtually is one. class syntax, as the paper says, is a syntactic sugar around type with tree arguments. type with three arguments is invoked during runtime, it would be rather stupid, and slow to do semantics on a runtime function right!? So classes are not ‘really’ classes!

For further clearity, this:

cls = type("Cls", (), { "foo": "baar" })

is equal to this:

class Cls:
   foo = "bar"

They might have looked at this paper, and said ‘nah, don’t fix what’s broken’ and this exact attitude that Python community has, from top to bottom, is why I have not used it in about 2 years, and unless paid handsomely, won’t use it in any projects.

I believe Python needs to decide if it’s an scripting language, a cross-platform juggernaut like Java is, or is it what it exactly is, a piece of crap hyped out to high heavens!

These are my opinions, I don’t think I am educated enough for these to be facts. But look through your heart, compare CPython’s VM opcodes with JVM’s opcodes. JVM is a full register machine (whereas Python is a stack machine), with low-level opcodes designed to get things done fast and portable. It has an infrasturcture, and an echosystem. Several languages run on it, hell even Python itself runs on it!

Sadly, because that dang C FFI is so sweet, CPython seems to be de facto the Python implementation. And Python is not even badly specified like Perl is. I prefer a highly non-orthogonal language like Perl for scripting any day of the week. I use Perl a lot for preprocessing C source files, or just using it as AWK replacement. Is Python supposed to be that? Or Java? Decide goddamit.

So what we get from this is, Python is a simple AWK-ascended UNIX scripting language that lazy people have made into de facto Java! lol

Again, I am not very educated on this matter, please don’t take my opinion as facts. I just made this thread to share this nice paper and a bit of trivia.

Thanks.

  • I’m a simp for Alan Kay and I goon to his SnappChat, but I subscribe to Andrew Appel’s OnlyFans (along with many others), so I did not look much into SmallTalk when it comes to ‘70s languages’. I guess I should do that. Thanks. I could have never put two and two together to realize Python uses prototypes. This blows my mind. Funny thing is just the other day I wrote JavaScript’s grammar. https://gist.github.com/Chubek/0ab33e40b01a029a7195326e89646ec5

    I guess I still got a lot to learn so better get moving. I guess by ‘full semantic analysis’ I meant do a a full type analysis ‘before’ you emit the bytecode, not after. What is the protocol here exactly? I have seen several variants and supersets of Python that do an ML-style type analysis. They achieve it via the `NAME [ ‘:’ TYCON ]’ syntax so the regular Python interpeter would still work.

    So thanks. Learned something with your post.

    • lolcatnip@reddthat.com
      link
      fedilink
      English
      arrow-up
      3
      ·
      9 months ago

      Historically Python has done no semantic analysis at all, and as far as I know CPython still ignores type annotations except for checking their syntax and (I think) checking that type expressions can be evaluated as regular expressions. It’s also one if the slowest languages around, and it used to be much worse in the 1.x days. The only actual declarations are global and nonlocal, unless they’ve added something else recently. Everything else that looks like a declaration is actually a statement executed for its side effects. The super function used to only be callable with two arguments, because automatically supplying self and the lexically enclosing class was considered too magical.

      If you’re looking for something like Java or C#, Python isn’t for you. It was designed for use cases like fancy scripts and small applications that aren’t CPU bound. It’s about as dynamic as a language can be, meaning it’s possible to break almost any analysis you might do with a call to eval, and a lot of what you’d expect to be core language primitives, like accessing a field of an object, can execute arbitrary code.

    • Corbin@programming.dev
      link
      fedilink
      English
      arrow-up
      2
      ·
      9 months ago

      There are Python compilers which do AST analysis instead of bytecode analysis, particularly Nuitka and Shed Skin. They aren’t very good, but it’s not clear whether that’s because working with the AST is somehow harder than working with the bytecode. RPython doesn’t compile all bytecodes; most generator/coroutine functionality is missing, for example.

      Think of type-checking as a syntactic analysis; this is how it avoids Rice’s theorem. Like you say, we can annotate names with type information, and we can do it without evaluating the code. The main problem here is that Python’s semantics don’t require these annotations to enforce the types of values; you may be interested in E, a research language from the 90s which did enforce type annotations on otherwise-untyped names. In Python, this doesn’t error:

      >>> x :int = "42"
      

      But in E, this does error:

      ? def x :int := "42"
      # problem: <ClassCastException: String doesn't coerce to an int>
      

      Sadly, E is long dead, and something of an archeological artifact rather than a usable system. But it may be inspiring to your future efforts, especially since it sounds like you’re learning how to build compilers. (I helped write Monte, a language which blends E and Python; it is also dead, but was more enjoyable than E.)

      • Why did you use a ? as a prompt for E, but a >>> as a prompt for Python? I know CPython uses >>> in its termio prompt (and I don’t know how they brought that to Windows?) but why would have E used ??

        • Corbin@programming.dev
          link
          fedilink
          English
          arrow-up
          1
          ·
          9 months ago

          I copied and pasted from the terminal to ensure that I formatted the error message properly. The question-mark prompt is what E used, or at least E-on-Java. Monte used a little Unicode mountain:

          ⛰  currentProcess.getProcessID() :Int
          Result: 2805098
          ⛰  def x :Int := "42"
          Exception: "42" does not conform to Int
          ⛰  "42" :Int
          Exception: "42" does not conform to Int
          

          I can’t really give a reason other than that the prompt characters on Unix-like systems are arbitrary and most REPL libraries allow them to be customized.