Python is a fantastic programming language and the author’s language of choice. In my opinion, Python is the best language to learn as you are slowly breaking your way into the Computer Science and programming world.

However, every programming language has its strengths and weaknesses; this is why we have so many of them, after all. Different use cases call for different designs and implementations. As we say, there is no one size fits all solution.

Python is a user-friendly, easy to learn, free to use, portable, and easily extensible programming language. It is adaptable to your programming style and needs, covering various styles, from Object-Oriented (OOP) to Functional Programming approaches.

On the other hand, Python is an interpreted language, which can set an upper limit to how fast code can execute. It is not very suitable for mobile development, it has its memory issues and, because it is a dynamically typed language, you discover most of the bugs during run-time.

But not all hope is lost. I argue that the most crucial issue is that of speed. But why are faster runtimes that important? Consider the use case of Machine Learning; you need to experiment and iterate fast. When you are working with big data, it’s crucial to have your functions return in seconds, not minutes. If the new and shiny neural network architecture you just invented does not work that well, you should be able to move to your next idea rapidly!

To address this issue, we can write our demanding functions in another language (e.g., C or C++) and leverage specific bindings to call these functions from Python. This is something many numerical libraries (e.g., NumPy, SciPy, etc.) or deep learning frameworks (e.g., TensorFlow, PyTorch, etc.) do in Python. So, if you are a Data Scientists or a Machine Learning engineer wanting to call CUDA functions, this story is for you. Let’s start!

Marshalling

The first step we have to take in this journey is to understand what marshalling is and how it works. From Wikipedia:

Marshalling is the process of transforming the memory representation of an object to a data format suitable for storage or transmission.

Why is this important for our subject? To move data from Python to C or C++, the Python bindings have to transform it into a form suitable for transmission.

In Python, everything is an object. How many bytes of memory an integer uses depends on the version of Python you have installed and your operating system, among other factors. On the other hand, a uint8_t integer in C always uses 8 bits of total memory. Thus, we have to reconcile these two types somehow.

Marshalling is something that the Python bindings take care of for us, but we may need to intervene in some cases. This won’t be the case for this story, but something we’ll encounter in later articles.

Managing Memory

C and Python manage memory differently. In Python, when you declare an object, Python automatically allocates memory for it. When you don’t need that object, Python has a garbage collector that can destroy unused or unreferenced objects, releasing the memory back to the system.

In C, things are entirely different. It’s you, the programmer, who must allocate the memory space to create an object, and then it’s you again the one who has to release that memory back to the system.

We should take this into account and release any memory we don’t need anymore on the same side of the language barrier.

A Simple Example

We now reached the point where we are ready to dip our feet in the water. When you finish this section, you’ll be ready to start working with Python bindings and C. This is an absolute beginner tutorial, but we’ll dive deeper into more complex examples in later stories.

What you’ll need

For this story, we are going to need two things:

  • Python 3.6 or greater
  • The Python development tools (e.g., the python3-dev package)

The C source code

To keep things simple, we will create and build a C library that adds two numbers together. Copy the source code below:

https://towardsdatascience.com/media/773f6c6255cbefe623b2fbf374e8ed43

Next, we need to compile the source code and build a shared library. To this end, execute the command below:

gcc -shared -Wl,-soname,libcadd -o libcadd.so -fPIC cadd.c

This command should produce a libcadd.so file in your working directory. You are now ready to move on to the next step.

Prying into a C library with ctypes

ctypes is a tool in the Python standard library that creates Python bindings. Being part of the Python standard library makes it ideal for our beginner tutorial as you do not need to install anything.

To execute the C cadd function from a Python script, copy the source code below:

https://towardsdatascience.com/media/277e7bca729f0dd6474e2a2448902fb8

In line 7, we create a handle to the C shared library we built before. In line 12, we declare the return type of the C cadd function. This is crucial; we need to let ctypes know how to marshal objects to pass them around and what types to expect to unmarshal them correctly.

This is the case also for the y variable in line 14. We need to declare that this is of type float. Finally, we can leave x as it is because, by default, ctypes thinks that everything’s an integer.

We can execute this script just like any other Python script:

python3 padd.py

The result is a kind of magic!

In cadd: int 6 float 2.3 returning  8.3
In Python: int: 6 float 2.3 return val 8.3

Congratulations! You have called a function of a C library from Python!

Conclusion

Python is a user-friendly, easy to learn, free to use, portable, and easily extensible programming language.

However, it has its weaknesses, and the most prominent of those is speed. To address this issue, we can write our demanding functions in another language (e.g., C or C++) and leverage specific bindings to call these functions from Python.

In this article, we used ctypes, a Python library that does precisely this. We were able to call a C library from our Python code and get back the results. I know this function did not do anything demanding, but this was just a demo. Let’s see if we can do anything more challenging in later articles!

Original Source

About the Author

My name is Dimitris Poulopoulos, and I’m a machine learning engineer working for Arrikto. I have designed and implemented AI and software solutions for major clients such as the European Commission, Eurostat, IMF, the European Central Bank, OECD, and IKEA.

If you are interested in reading more posts about Machine Learning, Deep Learning, Data Science, and DataOps, follow me on MediumLinkedIn, or @james2pl on Twitter.

Author

My name is Dimitris Poulopoulos, and I’m a machine learning engineer working for Arrikto. I have designed and implemented AI and software solutions for major clients such as the European Commission, Eurostat, IMF, the European Central Bank, OECD, and IKEA. If you are interested in reading more posts about Machine Learning, Deep Learning, Data Science, and DataOps, follow me on Medium, LinkedIn, or @james2pl on Twitter.

Write A Comment