Playing around with Jekyll for a new website idea last night (and I mean late into the early hours of this morning), it struck me that I had a fair few 404s that kept taking time to stomp out. I went to install MOMSpider, and found that even when installing from the FreeBSD ports tree it appears to be broken as shit. Maybe it hasn't been updated in a long time? It appears to depend on a bunch of deprecated and/or removed PERL modules.

It sounded like an interesting project, so I set about writing my own spider in Python. It's terribly trashy code, is extremely intolerant of badly formed HTML (and indeed doesn't even check the mime type of each response to see if it should try parsing HTML) and probably only-just-works, but it's doing the job for me. Maybe I'll fix all these bugs later on.

Red Lion VIC 3371, Australia fwaggle

Published:


Modified:

Never

Filed under:


Location:

Red Lion VIC 3371, Australia

Navigation: Older Entry Newer Entry