Monday, September 17, 2012

Python and Pointers

It is a time I'm fighting with Python, observing it in every possible perspective. While learning declaration of variables (though there's nothing called declaration of variables in Python), I got this strange idea how Python represents variables in memory. Before jumping into the problem right now, I would like to make a subroutine call for my previous post where I discussed about Python IDEs. Finally I settled with a gorgeous black beauty, Aptana Studio 3 and she's been serving me well so far. I'm not going into details of Aptana now but there is a lot to talk.
At then end of this post someone may probably argue what's the point of this crap. So for those who might have suspicions,I say, I am heavily influenced by the late Dennis Ritchie and C. It is no wonder that I was introduced to (proper)programming with C even though I have worked with BASIC,Pascal and VB 6 earlier.   In C, the concept of pointers makes the language more powerful and I used to exploit it very often. So whenever I play with a language, pointers are one the first things I search for.

It's natural that anyone new to Python with C/C++ background would seek pointers in Python too. There is nothing called pointers in Python.Let's put this in another way. Python syntax does NOT support what we call pointers in C/C++.  A pointer is simply a reference to a memory location in your RAM. When you declare a new variable, a sufficient memory portion is reserved to hold the value. We use a pointer to point the exact memory location of that variable(say the address).
In C/C++ environment it's peanuts to grab the idea of a pointer.

int x=1;
int* ptr=&x; // now the value of ptr contains the address of x
You can retrieve the data in the given memory location by calling *ptr which gives the current value of x.
We do not find a similar syntax in Python.
 We do not specify data types in Python and they are determined by the Python interpreter at the run time.Not only that, but also the same identifier can contain different types of variables over time.

e.g.
          # variable t in Python
          t=1
          print(t)
          t="PythonString"
          print(t)

The above code will run without errors and display 1 and PythonString accordingly. If you write a similar code in C/C++ it will immediately raise an error. So this might be one of the reasons why Python memory management is kept away from the coder.
Allowing the coder to manipulate the memory heap may cause fatal errors in the script because of this issue.  And more importantly Python memory management is totally invisible to the coder and handled by the interpreter itself. Python interpreter uses pointers internally for memory management but they are not available(private) for the user to manipulate.
Even though things go like this you cannot keep an enthusiastic Python programmer quiet by these justifications. There are Python modules available for those who really need to do some C work within Python. ctypes is one of such widely used modules. Assuming that you've had a look at the ctypes manual, I noticed the data types created using ctypes methods are not fully compatible with Python environment.

e.g:

        # create two c_int() objects
        from ctypes import*
        x=c_int(1)
        y=c_int(2)
        x+y

The operation x+y raises an error. TypeError: unsupported operand type(s) for +: 'c_long' and 'c_long'

This implies that Python environment does not recognize x and y as compatible operands for + operator. So what I feel is ctypes data are created on a virtual canvas or a surface so that they do not interfere with actual Python data items. They just interpret x and y virtually and they are only compatible with ctypes methods.
The pointer of a variable can be obtained by x_ptr= pointer(x) and this pointer functions comes in ctypes.
The Python/C API provides a full collection of modules and methods to custom management of memory including famous malloc(), alloc() and free() functions.

Among all these I found something really interesting. id(Object) is a function available in Python environment. The results obtained from the id(Object) function were awesome. Python documentation defines id(Object) as follows

id(Object) -
Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.
CPython implementation detail: This is the address of the object in memory.
Consider the following piece of code

# test 1
_var1=1
_var2=1
_var3=2
_var4=1
print(id(_var1))
print(id(_var2))
print(id(_var3)
print(id(_var4))

Output

505493792
505493792
505493808
505493792

Agreeing with the definition of id(Object) raises few questions.

1. Why the id() values of all _val1,_val2 and _val4 are equal?
                     It's true that the value of _val1,_val2 and _val4 are all equal. But if id(Object)  returns the memory address of the variables like in C, we are in trouble. It's because many variables point to the same memory location. The observation here is that the output of id(Object) is the same as long as the Object contains the same value.


2. Why does id(_val3) differ from the rest?
                    For now, we can say it is because the value of _val3 is different from the values of other variables. 

# test 02
 _var1=1
print(id(_var1))
_var2=2
print(id(_var2))
_var3=3
print(id(_var3))

Output
505493792
505493808
505493824

Note that _var1,_var2 and _var3 contain 1,2 and 3 respectively  and id() values differ from each other by 16 every time. This is just a fact but keep the fact aside for a while.

# test 03
_var1=1
print(id(_var1))
_var1=2
print(id(_var1))
_var1=1
print(id(_var1))

Output
505493792
505493808
505493792

See what's going on in the above code. When _var1 is assigned 1 for the second time id() value becomes the same. A single value always refers to the same id() output regardless when or where it is referred.

#test 4
_val1=1
_val2=2
_val3=3
_val4=4
print(id(_val1))
print(id(_val2))
print(id(_val3))
print(id(_val4))



Output
505493792
505493808
505493824
505493840

Now observe how the id() values change for the values 1,2,3 and 4. They have the same gap of 16. (I do not know from where this 16 comes from but still there is a constant gap between adjacent values. First I thought it might be the size of an integer in Python but sys.getsizeof(_var1) gave me 14). Interestingly, proving what's in my mind print(id(5)) outputs 505493856 (505493840+16). or id(4)+16. It is like Python has already given ids for all the variable values and just point to those variables when we assign them identifiers.This idea is crazy I know but honestly I could not imagine anything more reasonable. I still have no idea about the exact internals of Python's variable handling but I'm really enjoying this strange behavior. One thing is clear from the simple tests above, this id(Object) does NOT give me the pointer values in the sense it is used in C/C++.

p.s. I run Python 3.2.x downloaded from http://python.org/ftp/python/3.2/python-3.2.msi.


Fork me on GitHub