Searching The Internet

Welcome to the 'Searching' wing of SixtyOne. This old manor has been in my family for years but I have only recently decided to open it up to guests. Everyone is welcome here but I have a feeling only certain types may find their stay appealing.

Quick Jump




Advanced Image Searching


A Seekers Linux

About Searching Through Proxy Logs

Content Based Art Search

The Challenge

You don't need me to let you know the web is a big place. It's massive, and finding something out there can be a truly daunting job.

Enter the classical search engines. Everybody knows them, Google being the big player, followed by Yahoo and a raft of others nipping at each other for minor places.

These services are pretty good for doing what they do. Traversing the web, harvesting pages and providing a searchable index of what they find. But what about the pages they don't find, or can't find?

To understand this better you should be familiar with the structure of the web, as viewed by a seeker. A good model and explanation can be found here at Fravia's.

In a nutshell, there is a central core of web pages, all interlinking. Some (not all) of this core is indexed by the search engines, as their spiders crawl the web, following links. Pages at the fringes may not get indexed, for they may not have as many paths, but they are connected and therfore possible to be found by following hyperlinks.

Seekers Topology

Another part, perhaps in the core, consists of hidden or locked databases. Usually these pages are physically connected to the internet, but they are locked behind closed doors. Mostly so that someone can profit by selling you a key.

Two other components that are outside the core are the 'Outside Linked' and the 'Outside Linkers'. Outside Linked being pages that are linked to from within the core but do not link back to it. Search engine spiders can easily find these by following the links from within the core.

Finally the Outside Linkers are at the fringes of the known net. They link into various parts of the core but there are no links back to them. These are areas of the web that are hidden from the prying eyes of the search engines and thus, most of the unknowing masses. They may possibly be the hardest parts of the web to search in, but as we will see, potentially the most rewarding. These make up part of what is known as the 'dark web'.

The Reward

In these litigious days it's not often that somebody leaves treasure lying around on their doorstep. Usually it's buried treasure, probably deep on a remote island someplace. Welcome to our adventure; finding this treasure and digging it up!

Please take a seat in this parlour off the main entrance where we can talk quietly, for this is a delicate business. I can see in your eyes that you want to know more about this treasure. What treasure you are thinking? How valuable? You might like to read my thoughts On The Nature of the Value of Data.

You see, most advanced web seekers, usually end up finding whatever they want on the internet. Mostly anything that can, will end up on the net somewhere. It may not be lit up in neon. It may be burried on that remote, deserted island, unlinked by the charted web, waiting for an adventurer such as yourself to dig it up.

So you see the enormous benefit in being a Master Seeker. And the Outside Linkers are some of the last bastions of hidden treasure. For the web of yesterday has changed and now greedy data merchants are madly scrambling to persecute those involved in the free exchange of information and multimedia. You won't see this kind of stuff sitting on someone's doorstep anymore.

But the data merchants and their minions aren't sailing their web ships off the charts yet. We can do this, and the thought of the riches to be discovered can be tantilising.

The Tools

If you're going to dig for treasure you better get yourself a good shovel.

The searching wing here at SixtyOne will mostly concentrate on this 'dark web'. Sure I have included some basic forms for general searching, and even some specialised and regional offerings, but these are things you can find elsewhere, probably with more advanced options.

Keep in mind that the web is only a part of the Internet at large. We shouldn't forget this because it's easy to concentrate on the world wide web and forget about Usenet, IRC, telnet, FTP, email and mailing lists, personal messaging and the raft of P2P protocols that are now popular.

As the labratory here at SixtyOne expands, I hope to be able to provide tools to address some of these unmapped regions of cyberspace.

To begin you can experiment with some Content Based Image Recognition tools if you like.

Of course, your contibutions to the project will be invaluable so please forward your discoveries to me so I can publish them for the benefit of fellow guests.

* ~* ~**

You are deep inside SixtyOne - (c) Finn61