Geek Logbook

Tech sea log book

Preserving Directory Structure While Copying Files in Python

Introduction

When working with large datasets or numerous text files, it might be necessary to copy files containing specific words to a new destination while preserving the original directory structure. This can be particularly useful for maintaining organization and context. In this blog post, we’ll explore how to achieve this using Python.

The Problem

Imagine you have a complex directory structure with numerous text files, and you need to find all files containing a specific word and copy them to another directory, preserving the original directory hierarchy. Doing this manually would be tedious and prone to errors, but Python can help automate the process.

The Initial Approach

In a previous post, we discussed a simple function to copy files containing a specific word. Here’s a quick recap:

import shutil
import os

def searchWord(paths, word, destination_folder):
    res = []

    for path in paths:
        with open(path, "r") as f:
            content = f.read()

            if word in content:
                filename = os.path.basename(path)
                destination_path = os.path.join(destination_folder, filename)
                shutil.copy(path, destination_path)
                res.append(path)

    return res

# Usage
source_paths = ["example/file1.txt", "example/file2.txt"]  # Add your source file paths here
searched_word = "your_search_word"
destination_folder = r"C:\destination_folder"

result = searchWord(source_paths, searched_word, destination_folder)
print("Files containing the word were copied:", result)

Limitation

This approach works, but it doesn’t preserve the original directory structure. All copied files are placed directly in the destination folder without any subdirectories.

The Improved Approach

To preserve the directory structure, we need to calculate the relative path of each file and replicate this structure in the destination folder.

Here’s the updated version of the script:

import shutil
import os

def searchWord(paths, word, destination_folder):
    res = []
    common_prefix = os.path.commonpath(paths)

    for path in paths:
        with open(path, "r") as f:
            content = f.read()

            if word in content:
                relative_path = os.path.relpath(path, start=common_prefix)
                destination_path = os.path.join(destination_folder, relative_path)

                os.makedirs(os.path.dirname(destination_path), exist_ok=True)  # Create directories if they don't exist
                shutil.copy(path, destination_path)
                res.append(path)

    return res

# Usage
source_paths = ["example/dir1/file1.txt", "example/dir2/file2.txt"]  # Add your source file paths here
searched_word = "your_search_word"
destination_folder = r"C:\destination_folder"

result = searchWord(source_paths, searched_word, destination_folder)
print("Files containing the word were copied:", result)

Explanation

  1. Common Path Calculation: The os.path.commonpath(paths) function calculates the longest common sub-path of the provided paths. This helps in determining the relative paths of the files.
  2. Relative Path Calculation: The os.path.relpath(path, start=common_prefix) function calculates the relative path from the common prefix, ensuring that the original directory structure is preserved.
  3. Creating Necessary Directories: The os.makedirs(os.path.dirname(destination_path), exist_ok=True) function creates the necessary directories in the destination path if they don’t already exist.
  4. Copying Files: The shutil.copy(path, destination_path) function copies the file to the destination directory while preserving the relative path.

Conclusion

By calculating relative paths and creating necessary directories, we can preserve the original directory structure while copying files containing a specific word. This approach helps maintain organization and context, making it easier to manage and understand the copied files.

Tags: