wget: more versatile then a trusty swiss-army knife

A tool that makes life that much easer…

This is by far the most useful command-line tool when it comes to dealing with the web. wget is a small simple-to-use tool that allows you to get content from other sites. this includes html, tar balls, and anything else the server is willing to provide.

if you’re wandering what good it is when there are numerus browsers that do this, read on.

wget (unlike browsers) is a non-interactive tool, meaning it doesn’t need you once it has been given a job. such as downloading an entire directory, say this one. rather than clicking each item on that list, you can simply use wget by issuing this command:

wget -r ftp://ftp.geda.seul.org/pub/geda/release/v1.4

wget can also help by making those pesky online-only documents available off-line. take this for example (i know this is available in downloadable form but this method is applicable on other sites as well)

wget is a well-behaved tool, it will not download from any site that specifies rules prohibiting it. these rules are stored in the root directory of the server (robots.txt). however, most rules are meant to be broken. you can ignore rules by adding the following to the command:

-e robots=off --wait 1

The “wait” has nothing to do with ignoring rules, it simply makes it easer on the server by waiting 1 second between every fetch. please add this when downloading from good sites, and conveniently forget it when downloading from microsoft :p

Also, when recursing, wget will only go down 5 levels. if the site you want to download has a directory structure that goes deeper then this, add

- l

Finally, some sites analyze traffic and can determine if an automated application such as wget is downloading, and can block it. there is an solution to this but i won’t go into it since i never really used it.

enjoy the web with wget 🙂


Written by seininn

April 7, 2009 at 12:50 pm

