Table of Contents
- Introduction
- Prerequisites
- Installation
- Overview of
multiprocessing
- Creating Processes
- Passing Data to Processes
- Process Pools
- Synchronization Between Processes
- Error Handling
- Conclusion
Introduction
In this tutorial, we will explore the multiprocessing
module in Python, which allows us to leverage multiple CPU cores to perform parallel computing. Parallel computing can significantly speed up the execution of CPU-bound tasks such as mathematical calculations, simulations, and data processing.
By the end of this tutorial, you will have a good understanding of how to use multiprocessing
to execute tasks in parallel, pass data between processes, utilize process pools, synchronize processes, handle errors, and more.
Prerequisites
To follow along with this tutorial, you should have a basic understanding of Python programming. Familiarity with functions, modules, and the concept of multiprocessing will be helpful.
Installation
The multiprocessing
module comes built-in with Python, so there is no need for any additional installation. However, if you are using a Python version earlier than 2.6, you will need to install the multiprocessing
package manually using pip
.
shell
pip install multiprocessing
Now that we have covered the prerequisites and installation, let’s dive into an overview of the multiprocessing
module.
Overview of multiprocessing
The multiprocessing
module allows the execution of multiple processes within a Python program, enabling parallel computing. It provides an interface similar to threading but uses processes instead of threads. This distinction is important because Python’s Global Interpreter Lock (GIL) prevents multiple threads from executing Python bytecode simultaneously, limiting the benefits of using threads for parallelism. However, the multiprocessing
module overcomes this limitation by using multiple processes that can execute simultaneously on different CPU cores.
The multiprocessing
module provides a variety of classes and functions for working with processes, including the following:
Process
: A class for creating and managing individual processes.Pool
: A class for managing a pool of worker processes.Queue
: A class for passing data between processes.Lock
,Event
,Condition
, etc.: Classes for synchronization between processes.
Now that we have an overview, let’s explore the different aspects of multiprocessing
in detail.
Creating Processes
To create a new process using multiprocessing
, we need to define a target function that will be executed in the new process. This target function can take arguments and perform any computation we desire. Here’s a basic example:
```python
import multiprocessing
def worker(num):
print(f"Worker {num} executing")
if __name__ == "__main__":
processes = []
for i in range(4):
p = multiprocessing.Process(target=worker, args=(i,))
processes.append(p)
p.start()
for p in processes:
p.join()
``` In this example, we define a `worker` function that prints a message indicating its execution. We then create four processes, each assigned a unique worker number. The `start` method is called to start the execution of each process. Finally, the `join` method is called to wait for all processes to complete before moving forward.
Note the if __name__ == "__main__"
check. This is required when working with multiprocessing
to avoid infinite recursion in certain operating systems.
Passing Data to Processes
One of the key aspects of multiprocessing is the ability to pass data between processes. The multiprocessing
module provides a Queue
class that allows communication between processes. Here’s an example:
```python
import multiprocessing
def worker(queue):
value = queue.get()
print(f"Worker received: {value}")
if __name__ == "__main__":
queue = multiprocessing.Queue()
p = multiprocessing.Process(target=worker, args=(queue,))
p.start()
queue.put("Hello from the main process")
p.join()
``` In this example, we create a `Queue` object and pass it as an argument to the worker process. Inside the worker function, we retrieve the value from the queue using the `get` method. Before joining the worker process, the main process puts a value into the queue using the `put` method.
Process Pools
Creating and managing individual processes as shown earlier can be cumbersome, especially when dealing with a large number of tasks. The multiprocessing
module provides a Pool
class to simplify this process. Here’s an example:
```python
import multiprocessing
def worker(num):
print(f"Worker {num} executing")
if __name__ == "__main__":
pool = multiprocessing.Pool(processes=4)
pool.map(worker, range(4))
pool.close()
pool.join()
``` In this example, we create a `Pool` object with a specified number of processes. The `map` method is then used to apply the `worker` function to each item in the iterable `range(4)`. The `close` and `join` methods ensure that all processes in the pool complete their execution before moving forward.
Synchronization Between Processes
When working with multiple processes, it is often necessary to synchronize their execution to avoid conflicts or ensure proper order of operations. The multiprocessing
module provides several synchronization primitives, including locks, events, conditions, and semaphores. Here’s an example using a lock:
```python
import multiprocessing
def worker(lock):
lock.acquire()
try:
print("Worker executing under lock")
finally:
lock.release()
if __name__ == "__main__":
lock = multiprocessing.Lock()
processes = []
for _ in range(4):
p = multiprocessing.Process(target=worker, args=(lock,))
processes.append(p)
p.start()
for p in processes:
p.join()
``` In this example, we create a `Lock` object and pass it to each worker process. Inside the worker function, we acquire the lock using the `acquire` method before executing the critical section of code. The lock is then released using the `release` method. This ensures that only one worker can execute the critical section at a time.
Error Handling
When working with multiple processes, it is important to handle errors properly. If an exception occurs in a child process, it will not propagate to the parent process by default. To handle exceptions and prevent the program from hanging, we can use a try-except
block and the Exception
class provided by multiprocessing
. Here’s an example:
```python
import multiprocessing
def worker():
try:
# Code that can raise an exception
print(1/0) # Division by zero
except Exception as e:
# Print the exception message
print(f"Exception occurred: {str(e)}")
if __name__ == "__main__":
processes = []
for _ in range(4):
p = multiprocessing.Process(target=worker)
processes.append(p)
p.start()
for p in processes:
p.join()
``` In this example, the worker function contains code that raises a `ZeroDivisionError`. We use a `try-except` block to catch the exception and print its message. This prevents the child process from terminating abruptly and allows the parent process to handle the exception gracefully.
Conclusion
In this tutorial, we explored the multiprocessing
module in Python, which allows for parallel computing using multiple CPU cores. We covered the basics of creating processes, passing data between processes, utilizing process pools, synchronizing processes, handling errors, and more. By leveraging the multiprocessing
module, you can significantly speed up CPU-bound tasks and enhance the performance of your Python programs.