Python’s GIL (Global Interpreter Lock) was designed to be a thread-safe mechanism, and it effectively prevents conflicts between multiple threads. GIL makes it easy to implemente multi-threading with Python. However, it also prevents Python multi-threading from utilizing the multiple cores of a computer to achieve improved execution speed. This is why using the
threading module in Python won’t help your program run faster through parallelism.
The good thing is Python provides a
multiprocessing module since Python 2.6. With the
multiprocessing module we can spawn subprocesses and effectively avoid some of the limitations that GIL brings, on both Unix and Windows platforms.
In this post I’ll briefly introduce
multiprocess module and show how it can be used for parallel programming.
A simple example of multiprocessing
In the following example, we use
multiprocessing module to spawn a child process from a parent process using a
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
The output of this program will be:
1 2 3 4 5 6 7 8 9 10
The program starts the subprocess using
Three ways to start a process
Depending on the platform,
multiprocessing supports three ways to start a process.
Available on both Unix and Windows. The default on Windows. The parent process starts a fresh python interpreter process. Slower comparing with
Parent process uses
os.fork() to fork the Python interpreter. The child process is identical to the parent process, with inheritating all resources of the parent process. Available on Unix only. The default of Unix.
Starts a server process. Whenever a new process is needed, the parent process connects to the server and requests that it fork a new process. Available on Unix.
To select a start method, you can use
set_start_method(). This method should be used only once in the program.
1 2 3 4
Two ways to exchange objects between processes
Two types of communication channel between processes are supported in
multiprocessing, and they are:
- class Queue
- function Pipe()
If you need know more details, the Python document here will provide help.
Use a pool of workers
Pool class is a quite useful one in the
multiprocessing module, as in real life you’ll often need multiple workers to execute the tasks in your program in parallel. What the
Pool class represents is a pool of workers. The following example shows how to create a pool with 4 processes as workers, and assign tasks to the workers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
apply_async(f, args, kwargs) method calls a function for many times, or calls a number of different functions asynchronously with arguments arguments. Each process will NOT block other processes. The order of the multiple processes are not guaranteed.
close() prevents any more tasks from being submitted to the pool. Once all the tasks have been completed, the worker process will exit.
join() method waits for the worker processes to exit. It is required to call
terminate() before using
terminate() will stop all the worker processes. The difference is
close() will wait for worker process to finish, and
terminate() immediately shut down worker processes without completing outstanding work.
Another useful method provided by
multiprocessing module is
cpu_count(), which returns the number of CPUs in the current system. You can use this value to decide how many processes to create in a pool.
This post is my last post in 2016. Happy new year!