Searching The Internet



Welcome to the 'Searching' wing of Sixty One. This old manor has been in my family for years but I have only recently decided to open it up to guests. Everyone is welcome to wander around and explore but I have a feeling not everyone will enjoy the aesthetic or the discourse.

It's been about 15 years since I first opened this place up and a lot has changed on the internet and in the world. Least of all the amazing and highly admired person, who inspired the creation of this site, has now left us forever. It is his important legacy that I attempt to preserve and it is in his memory this place on the web is dedicated. Find out more about Fravia here.

15 years on and the skill to 'find information' on the net is now needed more than ever - particularly with the amount of dis-information, ads, spam, junk and noise we now have to deal with today.

So I have started renovating. Little by little, room by room, we will get this place repaired. Some paint here, some CSS or PHP there. But not too much - there's method behind this clean, minimal purity. You will gradually learn why this place is built the way it is as you wind and twist your way through the scaffolding and learnn more about 'seeking'.

Find me, and you too can grab a brush or spanner and help with the repairs.

Top Tools

*clang* *clang* "The tools sire - they being refreshed for the modern age." - Smithy


The main engine room with the big indexes. A good starting point.

When you need a balanced result set try the Meta Search Engines.

When you need to narrow your search to a culture or geography.

You may need to search for specific content or data formats.

Hungry? Head to the kitchen and help yourself.

Essays

Advanced Image Searching - Takes a look at Content Based Image Recognition in PHP for advanced seekers.

A Seekers Linux - Free and open source tools to make data seekers life easier.

About Searching Through Proxy Logs - Advanced and deep searching through internet logs.

Content Based Art Search - A practical example of image recognition for searching in PHP.

The Seekers Challenge

You don't need me to tell you the web is a big place. It's massive! Finding something out there in the sea of noise can be a truly daunting job.

Enter the classical search engine. Everybody knows the giants, Google being the big gorilla, followed by perhaps Bing and a raft of others nipping at each other for minor places.

These services are pretty good for doing what they do. Traversing the web, harvesting pages and providing a searchable index of what they find. And while search results are becoming more polluted with ads and spam and tricksters who game the system, the outcome is still good enough for most things. But what about the pages that aren't in their index? Or the information they simply can't and never will find?

To understand this better you should be familiar with the structure of the web, as viewed by a seeker. A good model and deeper explanation can be found here at Fravia's Searchlores.

Here's something to get you thinking about the complexity of it. The world wide web has a central core of web pages, all interlinking each other this way and that way, back and forwards. Some (not all, but maybe a lot) of this core is indexed by the search engines, as their bots crawl the web, following the links and pathways of this interconnected core.

Pages at the fringes of the core may not get indexed, for they have less paths connecting them and rely on a diligent search bot to follow those links to the edges of cyberspace. Perhaps it is a forgotten page of some long gone university student? Who knows what run-down web real estate still lives out at the fringes of the constantly mutating blob. But they are still connected, even tenuously, and therfore can still be found by the diligent seeker or bot by following the trail of hyperlinks.

Seekers Topology

Another tantalising part of the blob, perhaps in the core, consists of the hidden or locked databases of information. Mostly these pages are still highly linked to everything else in the core, but they are locked behind closed doors. Usually so someone can profit by selling you the key. But as dear +Fravia used to say, the web was designed from the ground up for sharing and disseminating information. It is only later that people tried adding locks and gates on top of this model. We know from history whenever these sorts of devices get bolted on as an afterthought, there often are chinks in the armour. We will see some of this as we explore further.

Two other interesting parts to this model that exist outside the main core are the 'Outside Linked' and the 'Outside Linkers'.

The Outside Linked are pages that are linked to from within the core but do not link back into it. It's like a oneway dead end street. Search engine spiders can easily find these Outside Linked pages by following the links from the core to these edge pages but then that's it. Time to put it in reverse and back out because there's nowhere else to go from these guys.

Finally the Outside Linkers are also at the fringes of the known web. They have links into various parts of the core but there are no links coming back out to them. Imagine a page of bookmarks that no-one ever shared or listed anywhere else on the web. These are areas of the web that are hidden from the capturing eyes of the search engine bots and hence everyone else. They may possibly be the hardest parts of the web to search in, but as we will see, potentially the most rewarding. These make up part of what is known as the 'dark web'. We will see if we can shine a light in this direction.

The Seekers Reward

In these litigious days it's not often that somebody leaves treasure just lying around on their doorstep. Usually it's buried treasure, and probably buried deep on a remote island someplace. Welcome to the spelunking adventure; finding this treasure and digging it up!

Please take a seat in this parlour off the main entrance where we can talk quietly, for this is a delicate business. I can see in your eyes that you want to know more about this treasure. What is this treasure you are thinking? How valuable could it be? You might like to read my thoughts On The Nature of the Value of Data.

You see, most advanced web seekers, usually end up finding whatever they want on the internet. Mostly anything that can be digitised, will end up on the net somewhere. It may not be marked on the map or lie glistening in the search engine sunlight. It may be burried on that remote, deserted island. Uncharted by the hyperlinked web, waiting for an adventurer such as yourself to sail ashore and dig it up.

So you see the enormous benefit in being a Master Seeker ~s~. And the Outside Linkers are some of the last unexplored bastions of hidden treasure. For the web of yesterday has changed and data merchants scramble to restrain those involved in the free exchange of information and data. Everything that can be monetised is being monetised and you won't see this kind of old, valuable treasure sitting on someone's front porch. Not anymore.

But the data merchants aren't always sailing their web ships off the charts yet. Determined and adventurous explorers can and do. We can. And we can dream of the digital riches for us to discover and be tantilised by it.

The Tools

"If you're going to dig for treasure you better get yourself a good shovel." - Finno 2006

The searching wing here at Sixty One will mostly concentrate on advanced tools and techniques. Sure I have included some basic search forms for general searching, and even some specialised and regional offerings for when you need to narrow down your search field. These are things you can also find elsewhere.

Keep in mind that the web is only a part of the Internet at large. We shouldn't forget this because it's so easy to get stuck on just the world wide web and forget about all the other internet connected methods of storing, sorting and presenting data. Usenet, IRC, FTP, email mailing lists, messaging platforms and the raft of peer to peer protocols.

The labratory here at Sixty One will explore and show case some ways we can build our own tools and robots to extend our searching into these different horizons and truly sail off the map using the power of web languages such as PHP.

To begin you can experiment with some Content Based Image Recognition programming if you like.

Of course, your contibutions to this old place will be welcome and invaluable so please forward your discoveries to me so I can publish them for the benefit of us and our future fellow guests.

* ~* ~**

You are deep inside Sixty One - (c) Finno