Menü schliessen
Created: July 13th 2023
Last updated: July 13th 2023
Categories: IT Development,  Php
Author: Tim Fürer

PHP: Directory Comparison Function

Tags:  function,  guide,  PHP
Donation Section: Background
Monero Badge: QR-Code
Monero Badge: Logo Icon Donate with Monero Badge: Logo Text
82uymVXLkvVbB4c4JpTd1tYm1yj1cKPKR2wqmw3XF8YXKTmY7JrTriP4pVwp2EJYBnCFdXhLq4zfFA6ic7VAWCFX5wfQbCC

Comparing two directories for differences is a task developers perform in scenarios like version control, backups, or simply tracking changes. Today, we're going to explore a straightforward PHP function that makes this process a breeze.


Function Explained

The function takes in two directory paths and returns an array showing files that have been added, removed, or changed.

First, the function (we named it compareDirectories) checks if the provided parameters are directories. This is done with PHP's native is_dir function, which returns true if the file is a directory and false otherwise.

if (!is_dir($dir1) || !is_dir($dir2)) {
    throw new InvalidArgumentException("Both parameters must be directory paths.");
}

Next, it collects file data from each directory using a helper function named getFiles. After that, it compares the file arrays to find added and removed files using PHP's array_diff_key function.

$dir1Files = getFiles($dir1);
$dir2Files = getFiles($dir2);

$addedFiles = array_diff_key($dir2Files, $dir1Files);
$removedFiles = array_diff_key($dir1Files, $dir2Files);

The array_intersect_key function identifies the common files between the directories.

$commonFiles = array_intersect_key($dir1Files, $dir2Files);

To determine the changed files, the function compares the md5_file hash values of common files in the directories using a foreach loop.

$changedFiles = [];
foreach ($commonFiles as $file => $dir1Hash) {
    $dir2Hash = $dir2Files[$file];
    if ($dir1Hash !== $dir2Hash) {
        $changedFiles[$file] = $dir2Hash;
    }
}

The result is an array containing added, removed, and changed files.

return ['added' => $addedFiles, 'removed' => $removedFiles, 'changed' => $changedFiles];

Understanding the Helper Function: getFiles

The getFiles function plays a critical role in gathering file data from a directory. Let's break it down.

function getFiles($dir, $base = '') {
    $files = [];
    $dh = opendir($dir);
    while (($file = readdir($dh)) !== false) {
        if ($file === '.' || $file === '..') {
            continue;
        }
        
        $path = $dir . '/' . $file;
        $filePath = ($base ? $base . '/' : '') . $file;
        
        if (is_dir($path)) {
            $files += getFiles($path, $filePath);
        } else {
            $files[$filePath] = md5_file($path);
        }
    }
    closedir($dh);
    return $files;
}

The function begins by opening the directory with opendir($dir). The opendir function in PHP opens a directory handle that can be used with readdir, rewinddir, and closedir. It returns this handle, or false if it fails.

Once the directory is open, a while loop starts to read the directory's entries using readdir($dh). The readdir function returns the filename of the next file from the directory. The filenames "." and ".." are always present, representing the current directory and the parent directory, respectively. These are skipped with a continue statement.

For each file, getFiles checks if it's a directory using is_dir($path). If it is, the function calls itself recursively. This is a key part of the function, allowing it to traverse not just the base directory, but all subdirectories as well.

if (is_dir($path)) {
    $files += getFiles($path, $filePath);
}

If the file is not a directory, the md5_file function calculates the MD5 hash of the file, effectively creating a unique identifier for the current state of the file's content. The MD5 hash of each file is stored in the $files array, with the relative file path as the key.

else {
    $files[$filePath] = md5_file($path);
}

Once all entries have been read, closedir($dh) is used to close the directory handle. At the end of the function, the $files array, containing all file paths and their corresponding MD5 hashes, is returned.


Practical Application

One use case for this function is version control. If you're managing a project with frequent updates, this function can help you quickly identify what files have been added, removed, or changed. Note that while this function can't specify the precise content changes within a file, it does flag that a change has occurred, indicating further investigation is required.

In conclusion, this PHP function is a handy tool for basic "diff checking". By leveraging PHP's built-in functions, it provides a simple and effective way to compare directory contents.


Complete Source Code

Here's the complete source code for the compareDirectories function and its helper function, getFiles. Feel free to incorporate it into your PHP projects as needed.

function compareDirectories($dir1, $dir2) {
    // Check if directories exist
    if (!is_dir($dir1) || !is_dir($dir2)) {
        throw new InvalidArgumentException("Both parameters must be directory paths.");
    }
    
    // Collect file data from directories
    $dir1Files = getFiles($dir1);
    $dir2Files = getFiles($dir2);
    
    // Compare the file arrays
    $addedFiles = array_diff_key($dir2Files, $dir1Files);
    $removedFiles = array_diff_key($dir1Files, $dir2Files);
    
    $commonFiles = array_intersect_key($dir1Files, $dir2Files);
    
    // Check if any common files have changed
    $changedFiles = [];
    foreach ($commonFiles as $file => $dir1Hash) {
        $dir2Hash = $dir2Files[$file];
        if ($dir1Hash !== $dir2Hash) {
            $changedFiles[$file] = $dir2Hash;
        }
    }
    
    return ['added' => $addedFiles, 'removed' => $removedFiles, 'changed' => $changedFiles];
}

function getFiles($dir, $base = '') {
    $files = [];
    $dh = opendir($dir);
    while (($file = readdir($dh)) !== false) {
        if ($file === '.' || $file === '..') {
            continue;
        }
        
        $path = $dir . '/' . $file;
        $filePath = ($base ? $base . '/' : '') . $file;
        
        if (is_dir($path)) {
            $files += getFiles($path, $filePath);
        } else {
            $files[$filePath] = md5_file($path);
        }
    }
    closedir($dh);
    return $files;
}