Preserving Directory Structure While Copying Files in Python
Introduction
When working with large datasets or numerous text files, it might be necessary to copy files containing specific words to a new destination while preserving the original directory structure. This can be particularly useful for maintaining organization and context. In this blog post, we’ll explore how to achieve this using Python.
The Problem
Imagine you have a complex directory structure with numerous text files, and you need to find all files containing a specific word and copy them to another directory, preserving the original directory hierarchy. Doing this manually would be tedious and prone to errors, but Python can help automate the process.
The Initial Approach
In a previous post, we discussed a simple function to copy files containing a specific word. Here’s a quick recap:
import shutil
import os
def searchWord(paths, word, destination_folder):
res = []
for path in paths:
with open(path, "r") as f:
content = f.read()
if word in content:
filename = os.path.basename(path)
destination_path = os.path.join(destination_folder, filename)
shutil.copy(path, destination_path)
res.append(path)
return res
# Usage
source_paths = ["example/file1.txt", "example/file2.txt"] # Add your source file paths here
searched_word = "your_search_word"
destination_folder = r"C:\destination_folder"
result = searchWord(source_paths, searched_word, destination_folder)
print("Files containing the word were copied:", result)
Limitation
This approach works, but it doesn’t preserve the original directory structure. All copied files are placed directly in the destination folder without any subdirectories.
The Improved Approach
To preserve the directory structure, we need to calculate the relative path of each file and replicate this structure in the destination folder.
Here’s the updated version of the script:
import shutil
import os
def searchWord(paths, word, destination_folder):
res = []
common_prefix = os.path.commonpath(paths)
for path in paths:
with open(path, "r") as f:
content = f.read()
if word in content:
relative_path = os.path.relpath(path, start=common_prefix)
destination_path = os.path.join(destination_folder, relative_path)
os.makedirs(os.path.dirname(destination_path), exist_ok=True) # Create directories if they don't exist
shutil.copy(path, destination_path)
res.append(path)
return res
# Usage
source_paths = ["example/dir1/file1.txt", "example/dir2/file2.txt"] # Add your source file paths here
searched_word = "your_search_word"
destination_folder = r"C:\destination_folder"
result = searchWord(source_paths, searched_word, destination_folder)
print("Files containing the word were copied:", result)
Explanation
- Common Path Calculation: The
os.path.commonpath(paths)function calculates the longest common sub-path of the provided paths. This helps in determining the relative paths of the files. - Relative Path Calculation: The
os.path.relpath(path, start=common_prefix)function calculates the relative path from the common prefix, ensuring that the original directory structure is preserved. - Creating Necessary Directories: The
os.makedirs(os.path.dirname(destination_path), exist_ok=True)function creates the necessary directories in the destination path if they don’t already exist. - Copying Files: The
shutil.copy(path, destination_path)function copies the file to the destination directory while preserving the relative path.
Conclusion
By calculating relative paths and creating necessary directories, we can preserve the original directory structure while copying files containing a specific word. This approach helps maintain organization and context, making it easier to manage and understand the copied files.