Content Based Image Recognition - a stab in PHP

~ Part 2 ~

By Finno


OK, finally I have found some time to continue with this essay. To recap, I've been looking at image analysis and comparison systems to see what benefits it offers for seekers. In the process I have decided to build my own tools for testing. Is it possible to perform these tasks in real time? Can we construct quick and dirty approximations of cutting edge systems for use on our own PC's over the web?

Yes. But don't expect miracles.

The idea was to build a tool that can analyse images brought in from the web. A spider or web crawler could be kicked off and pipe down images from your favorite art gallery web site for example.

The images would then be compared against the source image for a match. Hopefully you might then find out things such as information about the source of the image (the artist, country, era, etc); or maybe an improved version of the image (quality, resolution, etc).

Obvisouly this isn't a system that gives you instant results. Not if we are talking about crawling the web for content. Unless of course you already have a massive database of images in your back pocket then you are going to need to find your targets online.

The design I finally decided to go with for image comparison was based on the frequency of colours in certain regions of the source and target images. I have chosen this design as a compromise for speed and experimentation. There are many variables to tweak, some values giving you good results and others terrible, but I wanted to build in flexibility so we can experiment and discover what works well for the sources and targets we have in mind.

Here's the breakdown of things we need to do for the image analysis and comparison:

  • Normalise the source and target images to standard size (150 x 150)
  • Calculate three 25x25 regions in the image to analyse (total 625 pixels per region)
  • Read in the pixel RGB values
  • Keep count of how many pixels have the same RGB values (or close within a set amount)
  • Sort the list of RGB values to build a top 10 list of the most frequent colours
  • Compare top 10 lists of source and target for each region
  • Determine how many regions match (based on top 10 list comparison)
  • Write out log with results (including thumbnails, URL's etc)

But Where Do The Targets Come From?

In my first example you input both the source and target images. This works for the purpose of experimenting and tuning. It has a nice verbose output with thumbnails so you can get an idea of what is going on. Later, to speed things up, I will ditch the pleasantries and only write out text.

When you really want to use this tool to search for images you will need to spider the web.

Unfortunately I cannot provide everyone with a web spider using my resources. But I can provide the source so you can run it yourself, using your own bandwidth.

Before I jump in to discuss the code here is the form to submit image comparisons. For those who aren't intrested in the php you can play around with this straight away. For those who want to see what's driving it then read on.


Web Form - Image Compare Example 1

Source Image     Target Image

Colour proximity:   How much variation in the RGB value for each pixel to still be a match (0-254).

Region variation:   How many of the top 10 colours need to be the same (0-10).

Prevalence Order: Yes No   Does the order of the top 10 matter?
If yes then this compares the number 1 most frequent colour in the source image to the number 1 most frequent colour in the target and then the number 2 most frequent colour in the source with the number 2 in the target and so on.
If no then it will compare the number 1 most frequent colour in the source image with all the top 10 colours in the target. In other words the order doesn't matter, just as long as it matches another top 10 colour, not necessarily with the same ranking.



Image Compare Example 2

In this example you can see how these two images get recognised as a match. Click on each of them to see the differences in the originals. One has been resized and one has been run through a filter slightly blurring it. Because the images are normalised to the same size before comparing and the colours have been kept roughly the same we get a positive result.

http://finn61.sytes.net/test/resizer.php?srcimg=gi-mod_01.jpg&tarimg=gi-mod_07.jpg&prox=15&rvar=2&prev=y

Image Compare Example 3

This example displays how this system based on colour matching, breaks down. These two paintings were both painted by Overbeck but at different times. The copy is significantly lighter and hence using the same parameters as above it fails to detect a match.

http://finn61.sytes.net/test/resizer.php?srcimg=http%3A%2F%2Fwww.wga.hu%2Fart%2Fo%2Foverbeck%2Fitalia1.jpg&tarimg=http%3A%2F%2Fwww.wga.hu%2Fart%2Fo%2Foverbeck%2Fitalia2.jpg&prox=10&rvar=2&prev=n

You will probably have noticed by now that you can run your own comparisons by specifying the parms, including the source and target images, in the URL.


Image Compare Code

Note: All this codes is licensed under GPL which means please take and use it for your own work but you have to share the fruits of your labour.

Rather than bore you with blow by blow descriptions of each line of code I have tried to include meaningful comments within the php itself. Let me know if it's difficult to follow.

The image analysis is mainly in one chunk of php with a few smaller scripts called from within html image tags such as drawing boxes on images and blocks of colour.

  • resizer.php - Main php block, calls the other scripts
  • drawbox.php - When passed a source img (URL) returns a resized jpg with the region boxes drawn on it
  • rgthumb.php - When passed a source img with x and y start position will return a jpg of a 25 x 25 section
  • block.php - When passed rgb values returns a 15 x 15 jpg of solid colour.

Here's the code to display a colour block. You call it, passing rgb values and it returns a 15x15 jpg image. It's only used for display purposes.
You would use a html img tag such as this to call it:

<IMG SRC="block.php?red=120&grn=120&blu=120" BORDER=0>

<?php
    header
("Content-type: image/jpeg");
    
$red = $_GET['red'];
    
$grn = $_GET['grn'];
    
$blu = $_GET['blu'];
    
$im     = imagecreate(15, 15);
    
$colour = imagecolorallocate($im, $red, $grn, $blu);
    
imagejpeg($im);
    
imagedestroy($im);
?>

Here's the rgthumb.php code which displays a 25x25 section of an image (after resampling down to 150x150). 3 paramaters are passed, the source image and the x, y coordinates of the section you want returned from the image. Again, it's only used for display purposes.
You would use a html img tag such as this to call it:

<IMG SRC="rgthumb.php?srcimg=overbeck.jpg&posx=100&posy=35" BORDER=0>

<?php
    $srcimg
= $_GET['srcimg']; //url of source image
    
$posx = $_GET['posx']; //x coordinate
    
$posy = $_GET['posy']; //y coordinate

    
list($srcwidth, $srcheight) = getimagesize($srcimg);

    
$newsrcimg = ImageCreateFromJpeg($srcimg);
    
$newresimg = ImageCreateTruecolor(150, 150);

    
//Resample image to a standard 150 x 150 size
    
imagecopyresampled($newresimg, $newsrcimg, 0, 0, 0, 0, 150, 150, $srcwidth, $srcheight);

    
$rgthumb = ImageCreateTruecolor(25, 25);
    
imagecopy($rgthumb, $newresimg, 0, 0, $posx, $posy, 150, 150);

    
header('Content-type: image/jpeg');            //Create jpeg header
    
imagejpeg($rgthumb, null, 100);            //Send image to browser

    
imagedestroy($newresimg);        //Destroy image to release memory
    
imagedestroy($newsrcimg);        //Destroy image to release memory
    
imagedestroy($rgthumb);        //Destroy image to release memory
?>

Here's the drawbox.php code which is pretty chunky and only for demonstrative purposes so you can get a nice display. Only one parm is passed and that is the source image. It returns the image resampled to 150x150 with three boxes showing you the regions to be analysed.
You would use a html img tag such as this to call it:

<IMG SRC="drawbox.php?srcimg=overbeck.jpg" BORDER=0>

<?php
    $srcimg
= $_GET['img']; //url of source image

    
list($srcwidth, $srcheight) = getimagesize($srcimg);

    
$new_width = 150;
    
$new_height = 150;

    
$newsrcimg = ImageCreateFromJpeg($srcimg);
    
$newresimg = ImageCreateTruecolor($new_width, $new_height);

    
//Resample image to a standard 150 x 150 size
    
imagecopyresampled($newresimg, $newsrcimg, 0, 0, 0, 0, $new_width, $new_height, $srcwidth, $srcheight);

    
//Create a garish colour to draw on image so it is contrasting (hopefully)
    
$fluro = imagecolorclosest($newresimg, 155, 255, 155);    
    
    
drawbox(10,10,35,35,$newresimg,$fluro);        //Call function to draw box around region
    
drawbox(115,115,140,140,$newresimg,$fluro);    //Call function to draw box around region
    
drawbox(62,62,87,87,$newresimg,$fluro);        //Call function to draw box around region

    
header('Content-type: image/jpeg');            //Create jpeg header
    
imagejpeg($newresimg, null, 100);            //Send image to browser

    
imagedestroy($newresimg);        //Destroy image to get memory back
    
imagedestroy($newsrcimg);        //Destroy image to get memory back
    
    
function drawbox($begx,$begy,$endx,$endy,$newresimg,$fluro)
    
//Draw a 1 pixel wide box around the region.
    //Parms: $begx = starting row
    //       $begy = starting col
    //       $endx = end row
    //       $endy = end col
    //       $newresimg = image to draw box on
    //       $fluro = colour to draw box
    
{
        for (
$x = $begx; $x != $endx + 1; $x++) {

            
imagesetpixel($newresimg, $x, $begy, $fluro);
        }

        For (
$y = $begy; $y != $endy + 1; $y++) {

            
imagesetpixel($newresimg, $begx, $y, $fluro);
            
imagesetpixel($newresimg, $endx, $y, $fluro);
        }

        for (
$x = $begx; $x != $endx + 1; $x++) {

            
imagesetpixel($newresimg, $x, $endy, $fluro);
        }
    }
?>

Here's the main body of code used in the form above. You will see calls to the three scripts above. It's still a work in progress so this is very likely to change as I implement new features or optimisations. I'll try and improve the comments at some stage too.

<HTML>
<HEAD>
<TITLE> Image Compare </TITLE>
<META NAME="Author" CONTENT="Finn61">
<META NAME="Keywords" CONTENT="">
<META NAME="Description" CONTENT="">
</HEAD>

<BODY bgcolor="#D5FFF1" TEXT=#001010 LINK="#0000FF" ALINK="#00FF00" VLINK="#3366CC">

<?php
    
//Get the parameters passed at run time
    
$srcurl = $_GET['srcimg']; //url of source image
    
$tarurl = $_GET['tarimg']; //url of target image
    
$prox = $_GET['prox']; //prox variation
    
$rvar = $_GET['rvar']; //region variation
    
$prev = $_GET['prev']; //prevalence order checking

    //Determine size of original images
    
list($srcwidth, $srcheight) = getimagesize($srcurl);
    list(
$tarwidth, $tarheight) = getimagesize($tarurl);

    
//Create an image from the source url
    
$srcimg = ImageCreateFromJpeg($srcurl);
    
$newsrcimg = ImageCreateTruecolor(150, 150);

    
//Create an image from the target url
    
$tarimg = ImageCreateFromJpeg($tarurl);
    
$newtarimg = ImageCreateTruecolor(150, 150);

    
//Resample images to a standard 150 x 150 size
    
imagecopyresampled($newsrcimg, $srcimg, 0, 0, 0, 0, 150, 150, $srcwidth, $srcheight);
    
imagecopyresampled($newtarimg, $tarimg, 0, 0, 0, 0, 150, 150, $tarwidth, $tarheight);
    
    
imagedestroy($srcimg);        //Destroy old source image to get memory back
    
imagedestroy($tarimg);        //Destroy old target image to get memory back

    //Draw source and target image resized to 150x150 with sample regions indicated
    
echo "<center>";
    echo
"<P>Click images to see original before resampling.</P>";
    echo
"<A HREF=\"$srcurl\"><IMG SRC=\"drawbox.php?img=$srcurl\" BORDER=0 ALT=\"\"></A>";
    echo
"&nbsp;";
    echo
"<A HREF=\"$tarurl\"><IMG SRC=\"drawbox.php?img=$tarurl\" BORDER=0 ALT=\"\"></A>";

    
//
    //REGION 1 PROCESSING
    //

    
thumbs($srcurl,$tarurl,10,10,1);            //Call function to draw thumbs for region 1

    
$region = pixelread($newsrcimg,10,10,$prox);//Call function to analyse region1
    
echo "Number of unique colours in region 1 of source: ".count($region[0])."<BR>";

    
$sregion = topten($region);                    //Call function to sort top 10 colours

    
$region = pixelread($newtarimg,10,10,$prox);//Call function to analyse region1
    
echo "Number of unique colours in region 1 of target: ".count($region[0])."<BR>";
    
    
$tregion = topten($region);                    //Call function to sort top 10 colours

    
printblocks($sregion);                        //Call function to print top 10 colours
    
echo "&nbsp;";
    
printblocks($tregion);                        //Call function to print top 10 colours

    
comp($sregion,$tregion,$prox,$rvar,$prev);        //Call function to compare top 10 lists
    
    //
    //REGION 2 PROCESSING
    //

    
thumbs($srcurl,$tarurl,62,62,2);            //Call function to draw thumbs for region 2
    
    
$region = pixelread($newsrcimg,62,62,$prox);//Call function to analyse region2
    
echo "Number of unique colours in region2 of source: ".count($region[0])."<BR>";
    
    
$sregion = topten($region);                    //Call function to sort top 10 colours

    
$region = pixelread($newtarimg,62,62,$prox);//Call function to analyse region2
    
echo "Number of unique colours in region2 of target: ".count($region[0])."<BR>";
    
    
$tregion = topten($region);                    //Call function to sort top 10 colours

    
printblocks($sregion);                        //Call function to print top 10 colours
    
echo "&nbsp;";
    
printblocks($tregion);                        //Call function to print top 10 colours

    
comp($sregion,$tregion,$prox,$rvar,$prev);        //Call function to compare top 10 lists

    //
    //REGION 3 PROCESSING
    //

    
thumbs($srcurl,$tarurl,115,115,3);            //Call function to draw thumbs for region 3

    
$region = pixelread($newsrcimg,115,115,$prox);    //Call function to analyse region3
    
echo "Number of unique colours in region3 of source: ".count($region[0])."<BR>";
    
    
$sregion = topten($region);                    //Call function to sort top 10 colours

    
$region = pixelread($newtarimg,115,115,$prox);    //Call function to analyse region3
    
echo "Number of unique colours in region3 of target: ".count($region[0])."<BR>";
    
    
$tregion = topten($region);                    //Call function to sort top 10 colours

    
printblocks($sregion);                        //Call function to print top 10 colours
    
echo "&nbsp;";
    
printblocks($tregion);                        //Call function to print top 10 colours

    
comp($sregion,$tregion,$prox,$rvar,$prev);        //Call function to compare top 10 lists

    //CLEANUP

    
imagedestroy($newsrcimg);        //Destroy image to get memory back
    
imagedestroy($newtarimg);        //Destroy image to get memory back

    // END OF MAINLINE

    
function pixelread($newresimg,$row,$col,$prox)
    
//Create loop to move through each location in the region
    //         reading in the RGB values for that location.
    //Parms: $row = Starting row
    //       $col = Starting column
    //       $newresimg = image to analyse
    
{
        
$region = array();    //create array so it's not empty on first pass
        
$cnt = 0;

        for (
$y = $row; $y != $row + 25; $y++) {
            for (
$x = $col; $x != $col + 25; $x++) {

                
//Read in the RGB value from pixel at x,y location
                
$ind = imagecolorat($newresimg, $x, $y);
                
$colrs = imagecolorsforindex($newresimg, $ind);
                
$red = $colrs['red'];
                
$grn = $colrs['green'];
                
$blu = $colrs['blue'];
                
$region = palcount($red,$grn,$blu,$region,$prox);    //Call function to add this colour to our list
                
$cnt++;
            }
        }
        return(
$region);
    }

    function
palcount($red,$grn,$blu,$region1,$prox)
    
//Create an array of all unique colours, keeping count of previous hits.
    //Parms: $red = value for red
    //         $grn = value for green
    //         $blu = value for blue
    //         $region1 = array containing previous rgb values
    //         $prox = amount of variation allowed either side of a value
    
{

        
//Check if the colour matches one we already have recorded in the array
        //Allow a variation of $prox. The if's are nested just for neatness.
        
$found = false;
        for (
$i = 1; (isset($region1[0][$i])) && ($found == false); $i++) {
            if ((
$region1[0][$i] < $red + $prox) && ($region1[0][$i] > $red - $prox)) {
                if ((
$region1[1][$i] < $grn + $prox) && ($region1[1][$i] > $grn - $prox)) {
                    if ((
$region1[2][$i] < $blu + $prox) && ($region1[2][$i] > $blu - $prox)) {
                        
$region1[3][$i]++;    //Increment the counter for this colour
                        
$found = true;        //Set a flag so we know it's matched
                    
}
                }
            }
        }
        
        
//If the colour doesn't already match then record it in a new element
        
if ($found == false) {
            
$region1[0][] = $red;
            
$region1[1][] = $grn;
            
$region1[2][] = $blu;
            
$region1[3][] = 1;
        }
        return(
$region1);    //pass the array back
    
}

    function
topten($region)
    {
        
        
//Check we have at least 10 elements in the array, otherwise we can't compare.
        
if (count($region[0]) < 10) {
            echo
"<BR><B>ERROR:</B> Colour proximity is too high to get 10 unique colours. Try again.";
            exit;
        }
        
        
//Sort in reverse highest to lowest
        
arsort($region[3]);
        
        
//List out the top 10 and save each one into a new array
        
$cnt = 1;        //Preset counter to keep track how many loops we make
        
while ((list($key) = each($region[3])) && ($cnt < 11)) {
            
$sorted[0][$cnt] = $region[0][$key];
            
$sorted[1][$cnt] = $region[1][$key];
            
$sorted[2][$cnt] = $region[2][$key];
            
$sorted[3][$cnt] = $region[3][$key];
            
$cnt++;
        }
        return(
$sorted);    //pass the array back
    
}
    

    function
printblocks($sorted)
    {
        
//List out the top 10 and call routine to print a block of colour
        
for ($i = 1; $i < 11; $i++) {
            
//echo "<BR>".$sorted[0][$i].",".$sorted[1][$i].",".$sorted[2][$i].",".$sorted[3][$i];
            
echo "<IMG SRC=\"block.php?red=".$sorted[0][$i]."&grn=".$sorted[1][$i]."&blu=".$sorted[2][$i]."\" BORDER=0 ALT=\"\">";
        }
    }

    function
thumbs($srcurl,$tarurl,$posx,$posy,$regnum)
    {
        
//Draw thumbnail of sample region for source and target
        
echo "<P>Region $regnum has been analysed.<BR>";
        echo
"<IMG SRC=\"rgthumb.php?srcimg=$srcurl&posx=$posx&posy=$posy\" BORDER=0 ALT=\"\">";
        echo
"&nbsp;";
        echo
"<IMG SRC=\"rgthumb.php?srcimg=$tarurl&posx=$posx&posy=$posy\" BORDER=0 ALT=\"\">";
        echo
"<BR>";
    }

    function
comp($sregion,$tregion,$var,$rvar,$prev)
    {
        
$count = 0;  //Set counter to zero

        
if ($prev == 'y') {

            for (
$i = 1; $i < 11; $i++) {
                if ((
$sregion[0][$i] < $tregion[0][$i] + $var) && ($sregion[0][$i] > $tregion[0][$i] - $var)) {
                    if ((
$sregion[1][$i] < $tregion[1][$i] + $var) && ($sregion[1][$i] > $tregion[1][$i] - $var)) {
                        if ((
$sregion[2][$i] < $tregion[2][$i] + $var) && ($sregion[2][$i] > $tregion[2][$i] - $var)) {
                            
$count++;
                        }
                    }
                }
            }
        } else {
            for (
$i = 1; $i < 11; $i++) {
                
$found = false;
                for (
$j = 1; ($j < 11) && ($found == false); $j++) {
                    if ((
$sregion[0][$i] < $tregion[0][$j] + $var) && ($sregion[0][$i] > $tregion[0][$j] - $var)) {
                        if ((
$sregion[1][$i] < $tregion[1][$j] + $var) && ($sregion[1][$i] > $tregion[1][$j] - $var)) {
                            if ((
$sregion[2][$i] < $tregion[2][$j] + $var) && ($sregion[2][$i] > $tregion[2][$j] - $var)) {
                                
$count++;
                                
$found = true;
                            }
                        }
                    }
                }
            }
        }

        If (
$count >= $rvar) {
            echo
"<BR><B>Regions MATCH! $count colours are the same.</B>";
        } else {
            echo
"<BR>No match. $count colours are the same.";
        }
    }
?>

</BODY>
</HTML>

I know there's some repetition in the mainline code but I was trying to avoid passing too many vars when I called a function and keep it easier to read.

Well that's pretty much this part finished off. In Part 3 I will bolt on a web spider so we can go crawling for images to compare and look at some attempts to improve the accuracy and speed. Hopefully I will also try a variation on this colour matching technique to overcome the problem if one of the images has been colour filtered.

Back to PART 1

Update: Part 3 didn't get written. Instead I embarked on a project to build something useful out of the code we have so far. The aim was to demonstrate a real world use. The result is Content Based Art Search. Please have a look.

* ~* ~**


You are deep inside SixtyOne - (c) Finno