Powershell equivalent for wget switches "-nc" and "-i"

2

What is the Powershell equivalent of this wget command?

wget -nc -i downloadList.txt

Where

-i downloadList.txt
downloads a list of urls' in the specified file.

-nc
skips the already downloaded files.

Renuka

Posted 2015-10-04T07:10:26.907

Reputation: 151

Question was closed 2015-10-19T22:43:44.977

@DavidPostill Nope. I edited the question to make it clearer. – Renuka – 2015-10-04T08:01:18.783

It's not an exact duplicate but it is a starting point. There nothing stopping you modifying it to meet your exact needs. Hint Invoke-WebRequest -InFile<String> will do the 'i`' bit. – DavidPostill – 2015-10-04T08:02:36.380

1I did my Googling before posting this question. Cannot find how to get -nc and -i parameters for PS. – Renuka – 2015-10-04T08:05:51.043

Try googling harder. Invoke-WebRequest Invoke-WebRequest is the PS command for "Gets content from a web page on the Internet." ...

– DavidPostill – 2015-10-04T08:08:59.607

Still couldn't get it. Also tried extracting url by looping through the text file line by line . It was slow as hell ( on a large file ). – Renuka – 2015-10-04T08:17:45.293

Why don't you just use a windows version of wget then? – DavidPostill – 2015-10-04T08:19:07.300

Would prefer a native alternative ;) – Renuka – 2015-10-04T08:22:35.790

Answers

1

You can use not only PowerShell cmdlets but .Net classes too.

For -nc part, get contents of a file and select only unique strings with cat (alias for Get-Content) and sort (alias for Sort-Object). Then use wget (alias for Invoke-WebRequest) on this list of strings, extracting output file name from URLs with GetFileName

cat downloadList.txt | foreach {wget $_ -OutFile ([System.IO.Path]::GetFileName($_))}

Mikhail Tumashenko

Posted 2015-10-04T07:10:26.907

Reputation: 111

1

There is no Powershell switch which behaves exactly like -nc

It not only prevents overwriting a file. It also checks if the target file already exists and doesn't start a second download. The whole point of -nc is to prevent the actual download.

-nc

--no-clobber

If a file is downloaded more than once in the same directory, Wget's behavior depends on a few options, including -nc. In certain cases, the local file will be clobbered, or overwritten, upon repeated download. In other cases it will be preserved.

When running Wget without -N, -nc, or -r, downloading the same file in the same directory will result in the original copy of file being preserved and the second copy being named file.1. If that file is downloaded yet again, the third copy will be named file.2, and so on. When -nc is specified, this behavior is suppressed, and Wget will refuse to download newer copies of file. Therefore, "no-clobber" is actually a misnomer in this mode. It's not clobbering that's prevented (as the numeric suffixes were already preventing clobbering), but rather the multiple version saving that's prevented.

When running Wget with -r, but without -N or -nc, re-downloading a file will result in the new copy simply overwriting the old. Adding -nc will prevent this behavior, instead causing the original version to be preserved and any newer copies on the server to be ignored.

When running Wget with -N, with or without -r, the decision as to whether or not to download a newer copy of a file depends on the local and remote timestamp and size of the file. -nc may not be specified at the same time as -N.

Note that when -nc is specified, files with the suffixes .html or (yuck) .htm will be loaded from the local disk and parsed as if they had been retrieved from the Web.

So, in Powershell V3 you have to imitate this behavior. In a nutshell:

  • Get all basenames (no extension) of files in a given folder (download destination)
  • Get all urls of a given text file (downloadList.txt)
  • Compare both lists and retrieve missing URLs
  • Send only missing URLs to Invoke-Webrequest and append html as extension
$folder = "D:\my\folder"
Compare $(Dir $folder).BaseName (gc "D:\downloadList.txt")  -PassThru | 
    where {$_.SideIndicator -eq '=>'} | 
    foreach { wget $_ -OutFile "$folder\$_.html" }

And non-golfed

$folder = "D:\my\folder"
$exists = $(Get-ChildItem $folder).BaseName
$urls = Get-Content "D:\downloadList.txt" 
$missing = Compare $exists $urls  -PassThru | where {$_.SideIndicator -eq '=>'}
$missing  | foreach { Invoke-WebRequest -Uri $_ -OutFile "$folder\$_.html" }

nixda

Posted 2015-10-04T07:10:26.907

Reputation: 23 233