Geek Logbook

Tech sea log book

Copying Files Containing a Specific Word Using Python

Introduction

When working with large datasets or numerous text files, you might find yourself needing to search for files containing specific words or phrases. Automating this task can save a lot of time and effort. In this blog post, we’ll walk through a Python script that searches for a specific word in multiple files and copies those containing the word to a new destination.

The Problem

Let’s say you have a directory full of text files, and you need to find all files containing a specific word and copy them to another directory. This task can be time-consuming if done manually, especially if there are many files to search through. Python’s built-in libraries provide an efficient way to automate this process.

The Initial Approach

Here’s a simple Python function that searches for a word in a list of file paths and copies those files to a new directory:

import shutil

def searchWord(paths, word, destination_folder):
    res = []

    for path in paths:
        f = open(path, "r")
        content = f.read()

        if word in content:
            shutil.copy(path, destination_folder)
            res.append(path)

    return res

Issues with the Initial Approach

  1. File Not Closed Properly: The file is opened but not closed after reading, which can lead to resource leaks.
  2. Copying Mechanism: The shutil.copy() function is used incorrectly, as it needs the destination to be a file path, not just a directory.
  3. No Directory Creation: The code does not create the necessary directories in the destination path.

The Corrected Approach

Here’s the improved version of the script that addresses these issues:

import shutil
import os

def searchWord(paths, word, destination_folder):
    res = []

    for path in paths:
        with open(path, "r") as f:
            content = f.read()

            if word in content:
                filename = os.path.basename(path)
                destination_path = os.path.join(destination_folder, filename)
                shutil.copy(path, destination_path)
                res.append(path)

    return res

# Usage
source_paths = ["example/file1.txt", "example/file2.txt"]  # Add your source file paths here
searched_word = "your_search_word"
destination_folder = r"C:\destination_folder"

result = searchWord(source_paths, searched_word, destination_folder)
print("Files containing the word were copied:", result)

Explanation

  1. Opening Files with Context Manager: Using with open(path, "r") as f: ensures that the file is properly closed after reading, even if an error occurs.
  2. Copying Files: The shutil.copy(path, destination_path) function is used to copy the file to the destination directory, preserving the filename.
  3. Constructing Destination Path: The os.path.join(destination_folder, filename) function constructs the full path to the destination file.

Conclusion

By using Python’s built-in libraries, we can efficiently search for specific words in multiple files and copy those files to a new destination. This automated approach saves time and reduces the potential for errors compared to manual processing. In the next posts, we’ll delve into preserving directory structures and avoiding duplicate copies based on file content.

Tags: