Proxy Logs - The Other White Meat

Proxy Logs - The Other White Meat

~ By Finn61 ~

Originally published at fravia's searchlores in March 2002
Very slightly edited by fravia+

"Like I said earlier. There are logs all over the internet." - Unknown

The log files of a proxy server are interesting to look at because rather than a normal webserver, which would log access to pages on that server the proxy server also logs access to locations all over the net.

Where ever the users go, you can go there too. Outside linkers? Hidden files? You may find them here.

So how do you find a proxy server that publishes it's logs on the web? Search engine of course, but using careful search terms to pluck out our logs.

Case 1 - PWEBSTATS

Pwebstats is a pearl script that analyses server logs and produces stats and graphs.

By default, all the HTML pages that pwebstats produces include a footnote saying "These statistics produced by pwebstats".

You could do a search for this in Google if you want to see what kind of output this thing produces.

"These statistics produced by pwebstats"

To narrow it down, there is one particular page that is the most interesting to us. That is the requests list.

Once again there is another default setting we can utilise. The page displaying the requests uses the following format :

/servername.n.total-accesses.html

servername 3D the name of the server (strangely enough)
n 3D an incremental number

So now we can search for :

allinurl:"total-accesses.html"

But this gives us other servers as well as proxies. Narrow it down some more with :

allinurl:"total-accesses.html" proxy

This gets any that have the word proxy in the url.

If you are looking for forbidden fruit you may find it doesn't hang around for very long in a particular location. That's why it's sometimes important to find the latest version of the logs. You may not have any luck if you are looking at a log from 1998.

Try changing the incremental number in the URL. The highest number is usually the latest log.

OK, now here's a nice one in Taiwan that you may find contains quite a few interesting links :

http://proxy2.ncku.edu.tw/usage/days/requests/proxy2.360.total-accesses.html

..and if you would like to find out exactly what day the incremental numbers correspond to go here :

http://proxy2.ncku.edu.tw/usage/g-index.html

Case 2 - WEBALIZER

You didn't think pwebstats was the only utility for analysing log files did you?

Same deal, different software. This time we look for the footnote "generated by webalizer".

This search will give you some pretty stats to look at :

inurl:proxy "generated by webalizer"

Like before I've added the word "proxy" into the search to weed out any boring servers.

Now if you look at one of these pages you may get the summary like this one :

http://proxy.intechworld.net/webalizer/usage_200203.html

You'll notice you only get the top 100 URL's. When the total is 102348 you can be sure the top 100 are very common and very well known sites. If you're looking for that hidden jewel it is probably somewhere a little more secretive! :)

You can click on "View All URLs" to get them all. You may not want to do that with the above example because the page will be quite massive! Try this one instead :

http://www.pals.msus.edu/webstats/proxy/UCR/url_200108.html

I've noticed not all of the tables have the "View All URLs" link. I presume it depends on the config.

So now you should have some large lists of URL's you can scan for that hard-to-find document or program. Try amassing a few pages and searching through them all with a text scanner. You might find that hidden link.

There's more log's out there. There's more log analysers too. Each with their own distinguishable defaults. Have a look for them.
Send back your findings, let's all build on the shoulders of each other!

Ciao,

Finn61

DQ's addition (March 2002, probably true for all other services à la webalizer as well):

Finn61, very nice essay!! 

A short addition: Webalizer stores its findings 
by default in a raw file in the same subdirectory 
as the generated html files. This raw file is named 
webalizer.current. 

Using your examples, we (and not google ;-) 
are able to fish this file and extract whatever 
additional info is held back from our prying eyes. 

http://www.pals.msus.edu/webstats/proxy/UCR/webalizer.current 
http://proxy.intechworld.net/webalizer/webalizer.current 

HTH,

DQ

* ~* ~**

You are deep inside SixtyOne - (c) Finn61