I found that some sites actively block the wget user-agent string, to prevent the automated grabbing of web pages. I particularly found that ctrl-alt-del was blocking it. I tried hitting cad-comic.com so many times, couldn't figure out what was wrong till I test user-agent switcher in Firefox and found out that the Wget agent string is blocked.
I was furious for the moment, but of of course:
---
man wget
/user-agent
---
yielded the solution.
wget --user-agent="opera"
At first I was angry, but then amused, that their efforts are thwarted by an option that comes stock with the tool. Also before that I found a solution in "w3m -dump_source" as w3m is not blocked.
I thought I'd vent my anger a bit here. :) Hope it helps anyone in the future trying to automate the downloading of webpages...
consequently, if anyone wants to read ctrl-alt-del or questionablecontent enmasse without clicking "next" repeatedly, let me know, I have some scripts that make this convenient. :)
(email me as I often forget to check these nowadays)
I'm pretty sure there are
I'm pretty sure there are Firefox extensions to automate such processes as well. You might want to look at http://pipes.yahoo.com/pipes/search?r=source%3Acad-comic.com for some inspiration.
--
Andrew
Perhaps use cURL?
Perhaps use cURL?