copyleft geek n3wjack software

python http log downloader

Digitouch by rent-a-mooseIf you happen to host your own blog on a domain of your own with a hosting company, you might have access to your raw HTTP log files just like me.
Since those raw logs are first of all quite interesting to have some statistics tool run on, and secondly are taking up precious disc space on your webserver, you might want to download those now and then to make sure you don’t run out of space and into trouble.

Since I’m that kind of guy that rather have his computer perform tedious tasks for him instead of having to do them himself, I wrote a little Python script to get the latest logfiles for me, and store them somewhere on my local machine for later processing. It leaves the logfile for the current day untouched, and downloads any other logfile (provided the logfiles have the date in their filename). So if the script didn’t run for a few days, it’s picking up those old logs as well the next time it is ran. Kinda sweet isn’t it?

The script is written in the mighty Python language, and licensed under the GNU GPL, and comes without any guarantees, wheee! You can get it right here as a zip archive. To make things work, you just need to fill out your domain and ftp account data in the downloadlogs() function call at the bottom of the script.

The trick is it gets the list of logfiles on the server, sorts them alfabetically and downloads all but the last, which if your files are named like exYYMMDD.log is the file for the current day. If this is not the case, you might not wanna use this script and avoid having your files deleted. Of course the script already downloaded the files before deleting them, so you should be find in case this would happen anyway.

I hope this is useful for some folks out there, so enjoy.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.