Learning how to create ZIP files with Python is one of the most practical skills for anyone who wants to automate data storage, organize backups, or speed up file transfers. Python ships with a built-in module called zipfile that lets you create, read, and manipulate compressed archives without installing any external tools. If you already work with automating tasks with Python, mastering this module will save hours of manual work and enable your scripts to handle backups and logs intelligently.
File compression is essential in modern software development. Beyond saving disk space, it speeds up data transfers across networks, which is especially relevant when consuming REST APIs with Python or attaching files to automated emails. This practical guide gets you to your first working compression script in under two minutes, following the best practices of the language.
Why Use Python to Compress Files?
There are several strong reasons to integrate compression directly into your code. The main one is autonomy: you do not depend on external software like WinRAR or 7-Zip to manage your data. According to the official Python documentation, the zipfile module supports the most common compression methods and guarantees full compatibility with Windows, macOS, and Linux.
Using Python also gives you the power to filter which files should be included in the archive through Python programming logic. For example, you can build a script that ignores temporary files and compresses only PDF reports generated the previous day, entirely automatically and free of human error.
The zipfile Module: Your Main Tool
The heart of compression in Python is the zipfile module. It treats ZIP archives as objects, making it easy to add items through simple method calls. No installation is needed because it is already bundled with the official Python interpreter. The compression process involves two logical steps: creating the container (the .zip file) and writing data into it.
A common beginner mistake is forgetting to close the file after writing, which can corrupt the archive. The gold standard recommendation is always to use the with context manager, which closes the file automatically even if an error occurs during the process. If you are not yet comfortable with how Python handles files safely, the guide on opening files with Python’s with statement covers the concept in full.
Setting Up the Environment
For this tutorial you only need Python installed. The zipfile module is part of the standard library, so no extra packages are required. Using a dedicated Python virtual environment is a good practice for any project, even if this particular example only uses built-in modules:
import zipfile
import osCompressing a Single File
The basic logic consists of opening a ZIP object in write mode ('w') and using the write() method. The compression=zipfile.ZIP_DEFLATED parameter is critical: without it, Python simply groups files into an archive without actually reducing their size:
with zipfile.ZipFile('my_archive.zip', 'w', compression=zipfile.ZIP_DEFLATED) as zipf:
zipf.write('document.txt')Compressing an Entire Folder with Python
In real projects, you rarely compress just one file. The more common need is to create an automatic backup with Python that captures entire folders including subdirectories. For this, you combine zipfile with the Python os module to walk the directory tree.
The os.walk() function is ideal for this task. It traverses all folders, subfolders, and files, allowing each item to be added to the ZIP. A critical detail is using os.path.relpath() to store relative paths inside the archive, so it does not recreate your full system directory structure (like C:/Users/Name/...) inside the ZIP:
def compress_directory(folder_path, output_zip):
with zipfile.ZipFile(output_zip, 'w', zipfile.ZIP_DEFLATED) as zipf:
for root, folders, files in os.walk(folder_path):
for file in files:
full_path = os.path.join(root, file)
# Preserve internal folder structure
relative_path = os.path.relpath(full_path, folder_path)
zipf.write(full_path, relative_path)Filtering Specific File Types
A very practical enhancement is compressing only files with a specific extension. Inside the loop, add a conditional check before calling write(). For example, to compress only PDF reports while ignoring log files and temporary documents:
for file in files:
if file.endswith('.pdf'): # Only PDF files
full_path = os.path.join(root, file)
relative_path = os.path.relpath(full_path, folder_path)
zipf.write(full_path, relative_path)You can extend this pattern to include multiple extensions by using a tuple with endswith(('.pdf', '.xlsx', '.csv')), which checks all three at once in a single readable condition.
Handling Errors During Compression
When working with files, several problems can arise. The source file may not exist, you may lack write permissions in the destination folder, or the disk may fill up mid-process. Ignoring these scenarios can cause your script to crash in the middle of a critical backup. Always use try and except in Python to catch these situations gracefully.
Common errors to handle include a FileNotFoundError when the source directory does not exist, and a PermissionError when the script lacks write access to the destination. Anticipating these makes your script professional and reliable in production environments.
Advanced Tips: Compression Levels and Performance
Python’s zipfile module offers different compression algorithms. ZIP_DEFLATED is the standard choice and is universally compatible. ZIP_LZMA compresses more aggressively, producing smaller files, but consumes significantly more CPU and memory and may not be readable by very old archive software. For files that are already compressed, like JPEG images or MP4 videos, using ZIP_STORED (no compression) is faster because trying to compress an already-compressed file rarely reduces its size and wastes processing time.
For large volumes of data, the bottleneck is usually disk I/O rather than CPU. You can measure the script’s execution time using the Python timeit module to identify whether optimization through multiprocessing would actually help in your specific use case.
Complete Project Code
Here is the full unified script. You can compress any folder by changing the two variables at the bottom. It includes the relative path handling, the best native compression mode, proper error catching, and a self-test that creates a sample folder if none exists:
import zipfile
import os
def create_zip(source_directory, output_file):
"""Compresses all files in a folder into a single .zip archive."""
try:
if not os.path.exists(source_directory):
print(f"Error: The folder '{source_directory}' was not found.")
return
with zipfile.ZipFile(output_file, 'w', zipfile.ZIP_DEFLATED) as zipf:
for root, _, files in os.walk(source_directory):
for file in files:
full_path = os.path.join(root, file)
name_in_zip = os.path.relpath(full_path, source_directory)
zipf.write(full_path, name_in_zip)
print(f"Added: {name_in_zip}")
print(f"nSuccess! Archive '{output_file}' created.")
except PermissionError:
print("Error: No permission to write in the destination folder.")
except Exception as e:
print(f"An unexpected error occurred: {e}")
if __name__ == "__main__":
FOLDER_TO_COMPRESS = "my_documents"
OUTPUT_ZIP_NAME = "backup_generated.zip"
# Create a test folder with a sample file if it does not exist
if not os.path.exists(FOLDER_TO_COMPRESS):
os.makedirs(FOLDER_TO_COMPRESS)
with open(f"{FOLDER_TO_COMPRESS}/test.txt", "w") as f:
f.write("Sample content for the ZIP test.")
create_zip(FOLDER_TO_COMPRESS, OUTPUT_ZIP_NAME)ZIP vs Other Archive Formats
Although ZIP is the most universal format, Python also supports TAR.GZ (common on Linux servers) through the tarfile module. The choice between them depends on your audience. If you distribute files to Windows users, ZIP is the standard choice. If you are packaging files for Linux servers or Docker containers, TAR.GZ preserves file permissions and metadata more faithfully. For anyone running scripts inside containers, the guide on running Python scripts in Docker covers environment-specific file handling patterns in detail.
Integrating Compression into Automated Workflows
Imagine a bot that generates daily sales reports. Instead of sending ten loose CSV files, your Python script compresses them into a single dated archive like sales_2025_10_27.zip. This improves the experience for recipients and keeps your file server organized automatically. You can also integrate this into log management systems: when a log folder reaches a certain size, Python compresses and archives the old logs and deletes the originals to free up disk space. That kind of intelligent automation is exactly what separates a simple script from a robust production tool.
Frequently Asked Questions
Can I add a password to a ZIP file using Python’s zipfile module?
The built-in zipfile module can extract password-protected ZIPs but does not natively support creating archives with strong AES encryption. For that purpose, the third-party pyzipper library is the recommended solution.
How do I compress files without reducing their size?
Use zipfile.ZIP_STORED as the compression method. This groups files into a single archive without applying any compression algorithm. It is useful for already-compressed formats like JPEG images or MP4 videos where further compression provides no benefit.
Can Python compress files larger than 4GB?
Yes. Python supports the ZIP64 extension, which handles archives larger than 4GB. The zipfile module enables this automatically when needed, as long as the operating system supports it.
What is the difference between ZIP_DEFLATED and ZIP_LZMA?
ZIP_DEFLATED is the standard method and is universally compatible. ZIP_LZMA achieves a much higher compression ratio (smaller files) but uses significantly more memory and CPU, and some older archive tools cannot open it.
Can I add files to an existing ZIP without overwriting it?
Yes. Open the ZIP in append mode: zipfile.ZipFile('archive.zip', 'a'). This adds new items without removing the ones already inside.
How do I compress only specific file types, like only PDFs?
Inside the file loop, add a condition: if file.endswith('.pdf'):. The write() call will only execute for files matching that criterion. You can also use a tuple to match multiple extensions at once: endswith(('.pdf', '.xlsx')).
Is the zipfile module slow for compressing thousands of files?
For thousands of small files, the overhead of opening and closing file handles can add up. To optimize, reduce unnecessary I/O operations and use fast SSDs. For general office automation and backup tasks, performance is very good out of the box.
How do I handle file names with special characters or accents?
Python 3 manages strings in Unicode natively, which prevents most encoding issues. If you encounter errors when opening the archive on very old systems, you may need to resolve UTF-8 encoding errors by adjusting the encoding parameter.





