How to Use Multiprocessing in Python to Speed Up Scripts

Leandro Hirt

Atualizado em: 12/05/2026

Published on: May 12, 2026

Reading time: 9 minutes

Have you ever noticed that your computer has far more power than your code actually uses? If you work with large data volumes or complex calculations, you have probably seen that Python, by default, runs tasks sequentially, meaning it uses only one CPU core at a time. Learning how to use multiprocessing in Python unlocks the full potential of your machine. By distributing work across multiple parallel processes, tasks that took minutes can finish in seconds, putting every core of your processor to work. According to the official Python documentation, the multiprocessing module sidesteps the Global Interpreter Lock by using subprocesses instead of threads.

What Is Multiprocessing and Why Is It Necessary?

Multiprocessing is a programming technique that allows a script to execute several tasks simultaneously by distributing them across the different cores of your CPU. Think of a factory with eight machines but only one worker operating one of them while the other seven sit idle. Multiprocessing is the equivalent of hiring more workers so all machines run at once.

Python has a built-in constraint called the Global Interpreter Lock (GIL). The GIL prevents multiple threads from executing Python bytecode simultaneously within a single process. This means that even if you create dozens of threads, only one will actually run Python code at any given moment for CPU-bound work. The multiprocessing module bypasses this limitation by creating entirely separate processes, each with its own Python interpreter and memory space.

If you have noticed that your Python script is running slowly, there are many optimization techniques available, but few are as dramatic as switching to a multi-core model. Unlike threading, which is better for I/O operations like downloading files or reading from disk, multiprocessing is the right tool for heavy mathematical processing and intensive data manipulation.

Multiprocessing vs Threading: Choosing the Right Tool

A common question is when to use threads and when to use processes. Threads share the same memory space, making them lightweight but subject to the GIL. Processes are full copies of the program and bypass the GIL entirely.

Scenario	Best Tool	Reason
Downloading files, API calls	threading / asyncio	I/O bound: CPU waits for network
Heavy calculations, data processing	multiprocessing	CPU bound: needs real parallelism
Mixed I/O and computation	Both combined	Use each where it fits

If you are interested in the async alternative for I/O-heavy tasks, the guide on asyncio in Python covers the non-blocking event loop approach in detail, which pairs well with multiprocessing in hybrid applications.

Getting Started: The Process Class

No external libraries are needed since Python ships the multiprocessing module in its standard library. Before writing any parallel code, make sure your Python virtual environment is configured correctly to avoid version conflicts.

The code below shows the basic structure for starting a separate process. Note the mandatory if __name__ == "__main__": guard. Without it, your script can enter an infinite loop of process creation on Windows, as each new process re-imports the script and tries to spawn more workers:

Python

import multiprocessing

def heavy_task(name):
    print(f"Running task for: {name}")

if __name__ == "__main__":
    p = multiprocessing.Process(target=heavy_task, args=("Process_1",))
    p.start()
    p.join()  # Wait for the process to finish before continuing

import multiprocessing

def heavy_task(name):
    print(f"Running task for: {name}")

if __name__ == "__main__":
    p = multiprocessing.Process(target=heavy_task, args=("Process_1",))
    p.start()
    p.join()  # Wait for the process to finish before continuing

The Pool: Distributing Work Across All Cores

The real power of multiprocessing appears when using the Pool object. It manages a group of worker processes automatically, distributing tasks from a list across however many cores you specify. If you have 1,000 images to process, a Pool with 8 workers will send roughly 125 to each core simultaneously. This is a natural complement to resizing images with Pillow, where each image can be processed independently:

Python

from multiprocessing import Pool

def calculate_square(n):
    return n * n

if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5, 10, 20, 50]

    with Pool(processes=4) as pool:
        results = pool.map(calculate_square, numbers)

    print(results)
    # Output: [1, 4, 9, 16, 25, 100, 400, 2500]

from multiprocessing import Pool

def calculate_square(n):
    return n * n

if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5, 10, 20, 50]

    with Pool(processes=4) as pool:
        results = pool.map(calculate_square, numbers)

    print(results)
    # Output: [1, 4, 9, 16, 25, 100, 400, 2500]

The context manager (with Pool() as pool) ensures all worker processes are properly terminated and resources are released when the block exits, even if an exception occurs inside. Always use this pattern rather than calling pool.close() manually.

Identifying Bottlenecks Before Parallelizing

Not every script should be parallelized. Creating a new process has overhead in time and memory. If the task is very simple, the cost of spawning the process will exceed the execution time of the task itself. It is essential to profile your code first using tools like cProfile. Learning to use cProfile to identify bottlenecks in Python will help you decide whether multiprocessing is actually the right solution.

Sometimes slowness comes not from the CPU but from memory-inefficient data structures. Understanding the difference between list comprehension vs generator expression can save enough RAM that the need for parallelism is reduced significantly. Always profile first, then optimize.

Because each process has its own isolated memory, they cannot simply share variables. If you need Process A to communicate something to Process B, you must use specific inter-process communication mechanisms. The two main options are Queue and Pipe:

Python

from multiprocessing import Process, Queue

def producer(queue):
    for i in range(5):
        queue.put(f"item_{i}")
    queue.put(None)  # Signal that production is complete

def consumer(queue):
    while True:
        item = queue.get()
        if item is None:
            break
        print(f"Processed: {item}")

if __name__ == "__main__":
    q = Queue()
    p1 = Process(target=producer, args=(q,))
    p2 = Process(target=consumer, args=(q,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()

from multiprocessing import Process, Queue

def producer(queue):
    for i in range(5):
        queue.put(f"item_{i}")
    queue.put(None)  # Signal that production is complete

def consumer(queue):
    while True:
        item = queue.get()
        if item is None:
            break
        print(f"Processed: {item}")

if __name__ == "__main__":
    q = Queue()
    p1 = Process(target=producer, args=(q,))
    p2 = Process(target=consumer, args=(q,))
    p1.start()
    p2.start()
    p1.join()
    p2.join()

A Queue works like a bank queue where one process places items at the end and another retrieves them from the front. It is safe for concurrent use and prevents data corruption when multiple processes write simultaneously.

Memory Management Considerations

Because each process carries its own copy of all data, it is easy to exhaust your machine’s RAM when working with large datasets. If you encounter a MemoryError in Python, check whether you are loading large files or DataFrames inside each Pool worker. The correct approach is to pass only file paths or small identifiers to each process and let the worker open or load the resource independently when it actually needs it.

Complete Project Code: Sequential vs Parallel Benchmark

Here is the full unified script that runs a CPU-intensive task both sequentially and in parallel, measuring and comparing the elapsed time so you can see the performance difference directly on your machine. Close heavy programs before running to make all CPU cores available:

Python

import multiprocessing
from multiprocessing import Pool
import time

# Function simulating heavy CPU processing
def intensive_task(x):
    counter = 0
    for i in range(5_000_000):
        counter += i + x
    return counter

def run_benchmark():
    data = [10, 20, 30, 40, 50, 60, 70, 80]

    # Sequential execution
    print("Starting sequential execution...")
    start_seq = time.time()
    seq_results = [intensive_task(d) for d in data]
    end_seq = time.time()
    print(f"Sequential total time: {end_seq - start_seq:.4f} seconds")

    # Parallel execution with Pool
    print("nStarting parallel execution (Multiprocessing)...")
    start_par = time.time()
    with Pool() as pool:  # Pool() with no args uses all available CPU cores
        par_results = pool.map(intensive_task, data)
    end_par = time.time()
    print(f"Parallel total time: {end_par - start_par:.4f} seconds")

    # Performance gain
    speedup = (end_seq - start_seq) / (end_par - start_par)
    print(f"nThe parallel script was {speedup:.2f}x faster!")

if __name__ == "__main__":
    run_benchmark()

import multiprocessing
from multiprocessing import Pool
import time

# Function simulating heavy CPU processing
def intensive_task(x):
    counter = 0
    for i in range(5_000_000):
        counter += i + x
    return counter

def run_benchmark():
    data = [10, 20, 30, 40, 50, 60, 70, 80]

    # Sequential execution
    print("Starting sequential execution...")
    start_seq = time.time()
    seq_results = [intensive_task(d) for d in data]
    end_seq = time.time()
    print(f"Sequential total time: {end_seq - start_seq:.4f} seconds")

    # Parallel execution with Pool
    print("nStarting parallel execution (Multiprocessing)...")
    start_par = time.time()
    with Pool() as pool:  # Pool() with no args uses all available CPU cores
        par_results = pool.map(intensive_task, data)
    end_par = time.time()
    print(f"Parallel total time: {end_par - start_par:.4f} seconds")

    # Performance gain
    speedup = (end_seq - start_seq) / (end_par - start_par)
    print(f"nThe parallel script was {speedup:.2f}x faster!")

if __name__ == "__main__":
    run_benchmark()

Advanced Optimization Tips

When using pool.map() with millions of very small tasks, sending one task at a time creates significant communication overhead between processes. The chunksize parameter lets you send batches of tasks at once, which dramatically improves throughput for large lists. For example, pool.map(func, data, chunksize=100) sends 100 items per batch to each worker.

If your project needs to run across different servers or be deployed in a consistent environment, combining multiprocessing with Docker to run Python scripts ensures the parallel behavior works identically on your local machine and in the cloud. Docker also makes it straightforward to scale horizontally by running multiple containers.

For data science workflows involving Python data analysis with Pandas and NumPy, multiprocessing is particularly valuable for applying expensive transformations to large DataFrames, since each chunk of rows can be processed in a separate worker and reassembled afterward.

Frequently Asked Questions

Does multiprocessing work the same way on Windows and Linux?

Almost, but not exactly. On Linux, new processes are created via fork, which is very fast. On Windows, Python uses spawn, which starts a completely fresh interpreter, making startup slightly slower and making the if __name__ == "__main__": guard mandatory.

How many processes should I create in a Pool?

The ideal count is equal to or slightly less than the number of physical CPU cores on your machine. Use multiprocessing.cpu_count() to read that value programmatically and set the Pool size dynamically.

Can I use multiprocessing to read Excel files faster?

Yes. When handling large Python and Excel workflows, you can load different sheets or different files in separate processes to parallelize both reading and processing, which can dramatically cut total execution time for batch report generation.

What is the difference between pool.map and pool.apply_async?

pool.map blocks the script until all results return and preserves the original order of the input list. pool.apply_async does not block and allows results to be retrieved as they finish, which is useful when tasks have very different durations.

No. Each process receives its own copy of the program state at the moment it was created. Changes to a global variable in Process A have no effect on Process B or on the main process. To share state, use multiprocessing.Queue, multiprocessing.Value, or a Manager object.

Why did my script become slower after adding multiprocessing?

This happens when the task is too fast. The time required to create and destroy processes exceeded the execution time of the task itself. Only apply multiprocessing to tasks that take at least a few tenths of a second each. Profile first with cProfile to confirm the bottleneck is actually CPU-bound.

Does multiprocessing help with network or API calls?

Generally no. For API calls or database queries, the bottleneck is network latency, not CPU. In those cases, asyncio in Python or threading are more effective because the CPU can switch to another task while waiting for the network response.

How do I debug errors that occur inside worker processes?

Errors in child processes can be silent or poorly formatted in the main terminal. Wrap the worker function body in a try/except block and use the Python logging module to write errors to a centralized log file rather than relying on print statements, which may be interleaved across processes.