How Python Works: Behind the Scenes

This blog is completely based on my understanding of python internals. There might be some mistakes...

How Python Works: Behind the Scenes

This are the python Internals that I learned while studying python or learning , so while I am learning it my educator suggest me to write something that i extracted from there so here I am writing a small yet understandable blog on python Internal workings.


1.Architecture of Python

a. Source Code (.py files)

Human readable python code written by programmers.

b. Byte Code Compilation.

Python compiles the source code into bytecode. This bytecode is an intermediate representation, which stored in .pyc files (for performance optimization).

c. Python Virtual Machine.

The PVM executes the bytecode and performs the actual computation.


2.Types of python.

Yes, you read right there are several types of execution in python like for example I can name some here Cpython, jython (usually called as Jpython when we have to use Java binaries) , Iron python,Stackless(for Concurrency usage),PyPy(for performance centric) etc.


3.Cpython :- the default implementation

Cypthon is the default implementation of the python , what it means ? It means that 90% of time you are using this version or Interpretation of standard python. But yes there will be some cases when you need or use the other Implementation of python.

Now personally I like this feature of python It’s very beautiful for a language to have this kind of specificity.


4.Python’s Execution Model.

Now here in this point I want to talk about the core step by step implementation of python (.py) file. Like as a developer or student (like me ) what we do is create some (hello.py) file , so what happen then like where this thing go or process how output comes to us .

So this is what we all should know , like cut every deep things out for now but atleast we should know what we are writing and how that thing get in process of execution.

a. Parsing and Tokenization.

Python Parser analysis the source code. what is parser ? Great Question let’s find out ..

Parser in python is nothing just that takes the input data (this data is nothing but our source code) and convert it into a structured format.

Then the ‘lexer’ breaks the code into tokens (e.g., keywords,variables,operators).What are tokens ? Tokens are considered as the smallest bit of some code which got broken , they also exits in other languages like java(as far I know) , so they are the smallest piece of code that is now ready to process further.

What is lexer then? The Lexer is some sort of function which do lexical analysis of the source code to read it and then break it into sequence of tokens.

So after all this we get some result which is AST (Abstract Syntax Trees) there work is to represent the program structure.

What is AST actually ?

AST (Abstract Syntax Tree) is a module in python. This module helps the internal functioning to convert the sequence of tokens into it’s corresponding Abstract tree , In the form of list of nodes.

So from here we can say that AST is nothing but a list of nodes (which stores or made from the tokens ) and connected on the basis of Python’s Grammar.

b. Byte Code Generation.

What happens after this AST formation ?

This AST is sent to the compiler , which generates the byte code.

Byte code is low-level, platform-Independent set of instructions stored in .pyc files in the ‘__pycache__‘ folder.

Now this compiler may confuse you that and arise some questions like “python is interpreted language it is not compiled….” wait for some second I will tell the reason of mine behind writing it like this.

This compile or compiler is a tech jargon which just put here to indicate the essence of thought that our source code is now compile down or converted into some machine understandable code . No not your or mine OS(like Linux, macOS, windows understandable) but PVM understandable . What is PVM ? will talk soon.

Now this byte code can run anywhere ? Yes and No also .why ? what I mean by that?

I mean to say is Now this bytecode can run anywhere on any machine which have PVM in it.

So atlast we are moving forward, but the key things that we should kept handy is…

AST compiled down to binary but that binary is not Actual binary that some machine can understand or you can pass it to some OS directly, it is far different from The Machine code that generate after Assembly.

This bytecode or Binary is in context of the PVM.

c. Execution by the PVM.

The Python Virtual Machine (PVM) executes the binary code. It is a stack based virtual machine.

The PVM uses a main execution loop to fetch, decode, and execute instructions.

PVM is engine for python provided by the python.org , It execute that byte code and also the python script. Yes the PVM executes the both you read right the file with (.pyc) extension I will talk about this in detail below. But for now you just remember that the file that we programmer put with extension .py and files that exist in folder ‘__pycache__’ are both executed by the PVM.

It is full fledge Runtime engine for python.

It regularly run a main loop inside it to iterate the byte code , that means the PVM is running continuously so when ever you feed a file inside it will execute the file.

AGAIN pointing it our here that the byte code mention here is not the Machine code. Which means that this is not machine level code which you can run on a machine. It is python specific byte code means it is lowest level form of code for PVM.


5.Example which lead to a short revision.

I am providing a short example here which will depict the all execution steps written above .

x = 2 + 3
print(x)
  1. Lexer: Breaks the code into tokens like x, =, 2, +, 3, and print.

  2. Parser: Constructs an AST to represent the structure of the code. Now to imagine it you can imagine a tree with some nodes on it placed by parser in a specific order . who decide the order ? Already told the grammar of the python language.

  3. Compiler: Converts the AST into bytecode (e.g., LOAD_CONST, BINARY_ADD, etc.).

  4. PVM: Executes the bytecode to compute 5 and print it.

So this is the basic example I can provide here which depict the execution.

The most beautiful thing about python is that this much internal working is enough as a beginner to understand 90% of our stuff.


6. ‘__pycache__’ folder and .pyc files

So let’s talk about this folder also and the files with .pyc extension it contains.

This is the internal folder made by the python because when ever we run our script file, python creates a .pyc file. In Some of the cases you can't see them but in Some of the cases you can see them.

Now when ever you make any changes in your script file or Source code , this type of files get created and got stored in this folder , so python don't want to mix it's internal use case file with your work files because that make the things messy.

So that's why python creates a folder which starts and end with "__" this sign indicates that the particular folder/dir is for python internal use.

Now you are curious that ‘okay python creates this folder which contains some file with .pyc extension and it is for python use only but when will they appear and when not’ ..

Got you same question I asked myself . So let do a thing I will write some code and then with that I will explain you the concept of this folder formation.

#file 1
def print_msg(my_msg):
    print(my_msg)

print_msg("0 is not a natural number")
#file 2
from file1 import print_msg
print_msg("0 is a whole number")

Now I know this import statement will useless here because they are not actual file. But I want you to do this in your system.(in same directory)

Done!, Now run the file 2 using (python file2.py or any_name.py)

You will see a change in your Directory , what is it ? A new folder of name ‘__pycache__’ is created by its own and contain a file which have a long and very bad name.

But this file have .pyc extension it means that when we ran a simple .py file before it doesn’t did anything but when we ran a file which contains import statement it treated specially and Python created a folder for it.

Now let suppose you created 2 files with file1.py and file2.py name respective of the code block.

So the .pyc will have name like this ‘file1.cpython-312.pyc’ Now what it is a very long sense less file …

NO ! This is a very logical naming that has used by python. This name have 2 things A. Source change and B. Version of python

A. Source Change :- The file1 indicates the source from this file got created means the logic that has used in any file, will be shown here.

B. Python version :- here we have cpython-312 which means that we are working on 3.12 version of cpython , yes there are multiple versions of python as we know but we are using cpython, so yes this what we focus. That means the when we compiled the file our .pyc/binary file is created on this particular version of brach cpython from standard python.

This pyc file got implemented or created only when we use import in our code remaining cases don’t include this (this restriction come just 4-5 years before).

This thing will never be made using top files means when you have no import ,export then there is no sense of making and storing byte code according to python.

In other words , Python think it like that if there is a any import and export used in the file it will create a byte/binary file and store it in pycache folder which then given to the PVM but if there is no import and export in a file then python don't consider it as something tedious means hard/time consuming and put it directly in the PVM(Python Virtual Machine).

Python do this to make things works fast and optimized. If you ran the file1 and then file2 then you will see that the file2 is printing the stuff of file 1 also , will discuss this in detail in some next blog. But that output shows the caching of the imports means somewhere python may be store the data of file1 and when it used in file2 it executed 1 with 2 …. or something else also can happen

By this we are at the end of today’s topic or blog. Currently I am studying python so we will meet often (I hope same) and discuss something better (not that loops here). Hence this is my first blog i will try to be consistent and follow my passion towards documenting what I learn.

And also a Huge apology if I were wrong somewhere because human this side , I can make mistakes please put some effort to leave a comment (you can criticize) but tell me how useful is this blog for you. Thanks for being a part of my journey.

At last remember Engineering is about knowing behind the scenes not just knowing things.