This paper: https://cs.brown.edu/~sk/Publications/Papers/Published/pmmwplck-python-full-monty/paper.pdf
… has been out for several years now, and the CPython authors don’t seem to be taking any heed from it. The question one’s faced when viewing the inner-workings of CPython’s VM is:
Is Python a lazy language, or is it not? Should types and symbols be resolved through VM, or semantic analysis? Should there be explicit tree-building and DAG number-value optimization, or just shit out the bytecode?
Because the VM seems to build classes on-the-go [list of opcodes]. I am not pretending, and I don’t pretend, that I know enough about this, but would it be not better if they did a full semantic analysis, then emitted the bytecodes? So this way, the execution would be faster, albeit whilst introducing small lags for a more loaded semantic analysis?
Of course, the answer is clear: Python may not officially be a lazy language, but it virtually is one. class
syntax, as the paper says, is a syntactic sugar around type
with tree arguments. type
with three arguments is invoked during runtime, it would be rather stupid, and slow to do semantics on a runtime function right!? So classes are not ‘really’ classes!
For further clearity, this:
cls = type("Cls", (), { "foo": "baar" })
is equal to this:
class Cls:
foo = "bar"
They might have looked at this paper, and said ‘nah, don’t fix what’s broken’ and this exact attitude that Python community has, from top to bottom, is why I have not used it in about 2 years, and unless paid handsomely, won’t use it in any projects.
I believe Python needs to decide if it’s an scripting language, a cross-platform juggernaut like Java is, or is it what it exactly is, a piece of crap hyped out to high heavens!
These are my opinions, I don’t think I am educated enough for these to be facts. But look through your heart, compare CPython’s VM opcodes with JVM’s opcodes. JVM is a full register machine (whereas Python is a stack machine), with low-level opcodes designed to get things done fast and portable. It has an infrasturcture, and an echosystem. Several languages run on it, hell even Python itself runs on it!
Sadly, because that dang C FFI is so sweet, CPython seems to be de facto the Python implementation. And Python is not even badly specified like Perl is. I prefer a highly non-orthogonal language like Perl for scripting any day of the week. I use Perl a lot for preprocessing C source files, or just using it as AWK replacement. Is Python supposed to be that? Or Java? Decide goddamit.
So what we get from this is, Python is a simple AWK-ascended UNIX scripting language that lazy people have made into de facto Java! lol
Again, I am not very educated on this matter, please don’t take my opinion as facts. I just made this thread to share this nice paper and a bit of trivia.
Thanks.
There are Python compilers which do AST analysis instead of bytecode analysis, particularly Nuitka and Shed Skin. They aren’t very good, but it’s not clear whether that’s because working with the AST is somehow harder than working with the bytecode. RPython doesn’t compile all bytecodes; most generator/coroutine functionality is missing, for example.
Think of type-checking as a syntactic analysis; this is how it avoids Rice’s theorem. Like you say, we can annotate names with type information, and we can do it without evaluating the code. The main problem here is that Python’s semantics don’t require these annotations to enforce the types of values; you may be interested in E, a research language from the 90s which did enforce type annotations on otherwise-untyped names. In Python, this doesn’t error:
But in E, this does error:
Sadly, E is long dead, and something of an archeological artifact rather than a usable system. But it may be inspiring to your future efforts, especially since it sounds like you’re learning how to build compilers. (I helped write Monte, a language which blends E and Python; it is also dead, but was more enjoyable than E.)
Why did you use a
?
as a prompt for E, but a>>>
as a prompt for Python? I know CPython uses>>>
in its termio prompt (and I don’t know how they brought that to Windows?) but why would have E used?
?I copied and pasted from the terminal to ensure that I formatted the error message properly. The question-mark prompt is what E used, or at least E-on-Java. Monte used a little Unicode mountain:
I can’t really give a reason other than that the prompt characters on Unix-like systems are arbitrary and most REPL libraries allow them to be customized.