Content Based Image Recognition - a stab in PHP

It has been some time since I last contributed anything to Serachlores but during the snippets of time between moving jobs and homes and cities, I have kept some projects on and off the burner. This one, although not complete, is at a stage where I thought it could be of interest/benefit to ~seekers out there.

I must say that I am not an expert in this field and I am not very proficient at the higher mathematics. During my research some of the advanced mathematical algorithms went completely over my head and there was a lot of lingo which I was not familiar with that made some of the texts hard reading.

But that's ok, because this essay is about relating my findings and displaying some of the simple tools I have made to help us play around in this field. I want this essay to be accessible to the non-expert and I encourage them to take this info and expand on it.

I think this is an area which has yet to be exploited by web searchers and as we get more bandwidth and processing power at our disposal these same concepts and tools will be modified to work with other media besides still images.

A Quick Overview

There are a few competing terms that describe content based image recognition (CBIR) so you may have heard it by a different name.

The basic goal is to be able to compare numbers of target images and find matches based on the graphical content of the source image.

No surprises with the name; we are trying to recognise the content of an image, usually based on the content of a source image.

Colour and texture histograms are popular way of comparing images. Texture is a measure of how a pixel's colour varies from surrounding pixels. Images that have smooth gradient like colour changes would hve a low texture while busy images with lots of varying colours and sharp edges would have a high texture. Pixel values would be examined and histogram data saved for each image which is then used in comparison.

Another method may divide images into grids saving the average colours for each area in the grid. In more advanced experiments these techniques have been combined with shape recognition and edge detection in attempts to increase the accuracy of searches.

There are many sub categories to this field that specialise in processes like region splitting, anaylising, cataloging, detecting shapes, textures and colours. I am only scratching the surface. This is the quick and dirty guide.

So What's it Good For?

It may not be immediately apparent what the uses for this kind of system are until you read a smattering of the papers and see where most projects are heading. Retrieving images from huge databases where your query would consist of not text, but other images. For example, if you wanted pictures of owls your db query might be a picture of an owl. You might have some controls so you can tweak the sensitivity of your results to get pictures of birds, for a general query, or brown owls resting on branches, for a more specific one. Large online image databases are looking at these methods because we all know how difficult it is to describe an image with words.

Check out PicSOM for an example using Self Organizing Maps (a type of neural network).

This field also covers a lot of work in the video area too. As you can imagine, things like face recognition in crowds, censuring adult material, robots with visual memories and visual recognition, object detection etc, are all hot technolgies in these terrornoia days we live in. :(

I skipped most of this and concentrated on still images. Images are widely available on the web so they make a good choice for experimenting and they can be small and easier to manipulate. As I mentioned, I think a lot of the same techniques used for image searching can be transferred over to other media such as audio or video.

How Can Seekers Wield This Tech

My interest is using alternative methods to find material on the web. Break the chains and start looking for new tools.

It can be tricky to find images using the traditional text methods. Plus it is a manual process. If we could automate it, we could set it running and wait for an alert when, eventually(hopefully), it finds our target out there.

If you don't posess the image then you don't have a choice but to learn some techniques to describe it and go the textual route. But sometimes you might have the image already.

You may lift an appealing picture from somewhere and you want to find out who the artist is. You want the image in a different res, or quality, or format. Maybe you only have part of the image or someone has defaced it with a watermark or other crass text. Perhaps you just want to beat Mordred at his image riddles. :)

There are many examples where this could be useful. The key point is that you are not looking for the same file (if you were you could simply do a binary comparison) but you are looking for the same image.

Back To Earth - The Reality

As tantalising as the possiblities of all this sound it doesn't take long to realise that most of this stuff is very computationally intensive.

The big image databases do not do this sort of recognition in real time, not many CBIR projects attempt real time at all. They would usually build an index of metadata, extracted after analysing all the images. Searches would then be performed on this index. Hence only the source image ever needs to be analysed in real time, speeding up the process immensely.

Unfortunately we don't have that luxuary. Our image analysis and comparison needs to be performed at search time, on the web. This means we need to be content with a 'set and forget' system. Setup the search criteria, start it running, and then go fishing for the weekend.

Obvious key bottlenecks here are processor and internet bandwidth. With this in mind I have not implemented all of the methods employed in CBIR. Speed is more of a priority in this project than accuracy.

Why PHP?

Everyone probably has a different idea why PHP is the wrong language for this. For me though, it is a natural choice for the web, it is a language I have been playing with recently, it fits in well with the PHP Lab and unashamedly, I'm just quite fond of it.

That said, I'm not a PHP guru, so once again, I urge you to take and improve upon any of the code I have included.

Of course because we are dealing with images here the GD library is a must have in your PHP implementation.

I'd Rather Read About This Myself Thanks

As this is a laymans guide (being a novice in this myself) you might want to delve deeper yourself. There is quite a lot of material available on the web about this. I will include references to some documents at the end of this essay (if you make it that far!). As you may guess the .edu's and .org's are good places to begin looking for this stuff.

Even the .edu's aren't giving all this research away. You will find some doors closed, although I was amazed at how easy it is to find info that people would like you to pay for, freely lying around, sometimes mistakenly, in other places. If you get a closed/pay database that gives you preview pages of papers, then you know what to do to find the rest.

First An Image Trick To Warm Us Up

This trick is really an old, old trick that I expanded a little with some PHP.

It is the fishing trick using an image in a message board (or other suitable location). For those who may have forgotten, it goes like this: you are seeking information about a specific subject that is hard to find.

You target a web message board or similar and post a message relevant to your subject. You make it enticing, a honeypot, to attract those who have what you are seeking. In the message you have an image tag linking to an external image which you host. You then check your web logs and collect the ip addresses of everyone who requests the image, knowing that they will all have read your message. You have logged them all.

You then have a nice list of ip's you can check for http servers, ftp or whatever. It may just lead you to your target. They have high relevence to your goal.

That's the old trick. I was messing around with PHP image functions and decided to put some more twists on it.

Here's an image:

Mostly Harmless

It looks pretty harmless, just like any other image on a web page.

Here's the HTML that displays the image:

<IMG SRC="http://finn61.sytes.net/log_em.php" BORDER="0" ALT="Mostly Harmless">

The IMG tag is standard but notice the SRC is pointing to a PHP script. Not an image at all! This is where the fun starts, because once we have got them to run the script we can do all sorts of stuff with the data they have provided (referrer, IP address, browser etc).

Here's the log_em.php script with comments:

<?php
    $time
= date("d/m/y, H:i"); //Build the time and date
    
$ip = $_SERVER['REMOTE_ADDR'];  //Save the ip address
    
$referer = $_SERVER['HTTP_REFERER'];  //Save the referer
    
$browser = $_SERVER['HTTP_USER_AGENT'];  //Save the browser info

    //Open a file pointer to our log in append mode
    
$fp = fopen("log_em.txt", "a");
    
    
//Write the variables out and close
    
fwrite($fp, "$time IP: $ip Referer: $referer Browser: $browser \n");
    
fclose($fp);
    
    
//Now we send the image
    
header("Content-type: image/jpeg"); //Setup the header for a jpeg
    
    
$newimg = ImageCreateFromJpeg('tux.jpg'); //Create an image from a file
    
imagejpeg($newimg); //Send the new image to the browser
    
imagedestroy($newimg); //No need to keep the image we created
?>

The first thing we do is write everything they give us to a log file, for prosterity of course.

This gives us what we could have got from the web logs in the old trick, but I think it's nicer writing it to our file rather than wading through web logs. Also, we may not have access to the web server logs so it gives us an alternative and we can check it right away to see all the people who didn't disable this 'info spew' from their browser:

http://finn61.sytes.net/log_em.txt

You did remember to turn yours off didn't you?

The other thing we do is decide to display different images based on the info we recieved. PHP is quite good at image creation and manipulation so we could even write text over the image if we need to leave a message for someone.

You might make it your dead letter drop. What seems to everybody else like an innocent avatar in your signature, could display different text every week only to certain friendly ip's.

Anyway, you get the picture (bad pun intended). The important part is that after you do what ever you need to do with your script you send the JPG stream back to the browser. It will be expecting an image, because of the IMG tag, so an image you must send it.

Well this has dragged on, so I will end this part of the essay. We have covered what CBIR is, some things others are trying to do with it, what ~seekers can hope to do with it and we have played with some PHP image functions and a little trick to get us started.

In part 2 I will begin straight away with the meat and we will look at some basic ways to get started with PHP and image searching.

Finn61