Created: April 17th 2023
Last updated: January 2nd 2024
Categories: Php, Wordpress
Author: LEXO

A simple WordPress Website Image Scraper written in PHP to download all original images from a WordPress website

Tags: image download, image sraper, PHP, wordpress

Donate with

82uymVXLkvVbB4c4JpTd1tYm1yj1cKPKR2wqmw3XF8YXKTmY7JrTriP4pVwp2EJYBnCFdXhLq4zfFA6ic7VAWCFX5wfQbCC

Introduction

As a WordPress website owner, you may sometimes need to download all the original images from your website. Whether you're migrating your website or simply want to back up your images, an easy-to-use image scraper can save you time and effort. In this blog post, we'll introduce you to a simple PHP script that does just that. This script is suitable for beginners and is SEO optimized for keywords like WordPress, PHP, image scraper, website image download, WordPress image download, and PHP script.

How the Image Scraper Works

Our PHP script takes a URL as input and scrapes all the images from the provided WordPress website. It then downloads the original images, even if they've been resized by WordPress, and creates a ZIP file containing these images. The ZIP file is then sent to the browser for download.

<?php
    if (isset($_POST['submit'])) {
        /* add some basic security by sanitizing the input */
        $url = filter_var($_POST['url'], FILTER_SANITIZE_URL);

        if (!filter_var($url, FILTER_VALIDATE_URL)) {
            exit("Error: Invalid URL.");
        }

        $options = [
        'http' => [
        'header' => "User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36\r\n"
        ]
        ];
        $context = stream_context_create($options);
        $html = @file_get_contents($url, false, $context);
        $error_log = '';

        if ($html === false) {
            exit("Error: Unable to load the webpage.");
        }

        $dom = new DOMDocument;
        @$dom->loadHTML($html);

        $images = $dom->getElementsByTagName('img');
        $zip = new ZipArchive();
        $zip_name = 'images.zip';

        if ($zip->open($zip_name, ZipArchive::CREATE) !== TRUE) {
            exit("Unable to create ZIP archive.");
        }

        foreach ($images as $image) {
            $img_url = $image->getAttribute('src');
            $original_img_url = preg_replace('/-(\d+)x(\d+)\./', '.', $img_url);

            // Convert the URL to an absolute URL
            $original_img_url = relativeToAbsoluteUrl($original_img_url, $url);

            $img_data = @file_get_contents($original_img_url, false, $context);
            if ($img_data !== false) {
                $added = $zip->addFromString(basename($original_img_url), $img_data);
                if (!$added) {
                    $error_log .= "Failed to add image: " . $original_img_url . PHP_EOL;
                }
                } else {
                $error_log .= "Failed to download image: " . $original_img_url . PHP_EOL;
            }
        }

        if (!empty($error_log)) {
            $zip->addFromString('error_log.txt', $error_log);
        }

        $zip->close();

        header('Content-Type: application/zip');
        header('Content-Disposition: attachment; filename="' . $zip_name . '"');
        header('Content-Length: ' . filesize($zip_name));

        readfile($zip_name);
        unlink($zip_name);
        exit();
    }

        function relativeToAbsoluteUrl($rel, $base) {
            if (empty($rel)) {
                return $base;
            }

            if (parse_url($rel, PHP_URL_SCHEME) != '') {
                return $rel;
            }

            if ($rel[0] == '#' || $rel[0] == '?') {
                return $base . $rel;
            }

            $baseComponents = parse_url($base);
            $scheme = $baseComponents['scheme'] ?? '';
            $host = $baseComponents['host'] ?? '';
            $path = isset($baseComponents['path']) ? preg_replace('#/[^/]*$#', '', $baseComponents['path']) : '';

            if ($rel[0] == '/') {
                $path = '';
            }

            $abs = "$host$path/$rel";
            $re = ['#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#'];

            for ($n = 1; $n > 0; $abs = preg_replace($re, '/', $abs, -1, $n)) {
            }

            return $scheme . '://' . $abs;
        }
?>

<!DOCTYPE html>
<html lang="en">
    <head>
        <meta charset="UTF-8">
        <meta name="viewport" content="width=device-width, initial-scale=1.0">
        <title>Image Scraper</title>
    </head>
    <body>
        <form action="scrape_images.php" method="post">
            <label for="url">Enter page URL:</label>
            <input style="width: 900px;" type="url" id="url" name="url" required>
            <button type="submit" name="submit">Download Images</button>
        </form>
    </body>
</html>

Using the Image Scraper

You can download the code here:

Download wordpress-image-scraper.zip

Enter the URL to the website you want to scrape the images from
The script will process the URL, download the images, and create a ZIP file containing the images.
You'll then be prompted to download the ZIP file.

Conclusion

With this simple PHP image scraper, you can easily download all the original images from a WordPress website. The script is beginner-friendly and a great tool for website owners who need to quickly download images for migration or backup purposes.

March 15th 2024

Categories

Money money money...

A simple WordPress Website Image Scraper written in PHP to download all original images from a WordPress website

Introduction

How the Image Scraper Works

Using the Image Scraper

Conclusion