}

How To Use wget To Download Websites

7/31/2019

A friend needed to move her website from one hosting provider to another. The problem was that she used a website builder on the first provider so she didn't have the content on her computer; she recoiled at the thought of re-entering it. wget to the rescue!

GNU Wget is a tool that can download a web page or an entire site using http or https as a browser would, or using ftp (the file transfer protocol) if a site allows it.

Some usage examples


If I wanted to retrieve just a single page from a website, I could do that easily.

wget https://www.example.com

would retrieve the main page for the site, probably a file called index.html.

If I wanted to get a particular file or page from a site I could specify it, instead:

wget https://www.example.com/files/file.txt

If I wanted all of example.com, I could use

wget -r https://www.example.com

That could represent a lot of data, though! Fortunately, wget limits me to only going down through five levels of directory (folder) unless I tell it to do more. There are also options to tell it to retrieve the whole site (-m for mirror) or convert the site for local viewing by making all the internal links refer to the copy (-k).

wget is feature-rich


There are many options for wget users. If you don't want to read the entire manual, you can check out a concise list.

Some options I tend to use include:

-r wget -r -A.pdf https://www.example.com retrieves just the pdf files found

-b wget -b https://www.example.com/bigdoc.pdf resumes downloading bigdoc.pdf after a failure

-p wget -p https://www.example.com/page.html grab all the files necessary to display page.html

-P wget -P c:\tmp\page -p https://www.example.com/page.html as above but put the files in c:\tmp\page instead of the current directory/folder.

Installing wget


Wget is free software and covered by the GNU General Public License (GPL).

Linux, Mac, and UNIX users can install wget using their favorite package manager


wget screenshot

Windows users will need to go to the download page for Windows and choose what they want to install. For most people that will be the "Complete package, except sources". I clicked on the link, my browser brought up a window asking whether or not I wanted to download it, I said I did and the installer showed up in my downloads folder. I ran the installer, approved the license, and answered the questions by accepting the defaults. That's it.

Wget itself is a command-line program, so I have to start cmd to use it.

There is a (currently) more current version for Windows at https://eternallybored.org/misc/wget/ I tried it and it seemed to work.

Wget is a powerful tool for website maintenance and file retrieval. My friend was quite grateful for my help in mirroring her site. Hopefully, this essential intro to wget will help you with your website maintenance or help you retrieve files from the web more easily.

John McDermott

Written by John McDermott

John McDermott, CPLP, started his work in computer security in 1981 when he caught an intruder in a system he was managing. In recent years his consulting has included security consulting for small businesses. He is Security+ and CCP certified. In his 30 years with Learning Tree John has written and taught courses in programming, networking and computer security. He is the co-author of Learning Tree’s course System and Network Security: A Comprehensive Introduction. John is currently a learning and development consultant in northern New Mexico. He lives in a house made of earth with his wife, who is an artist.

Chat With Us