11
2
Okay, I give up. How do I size limit which files are downloaded, like say I don't want any files bigger than 2 MB?
11
2
Okay, I give up. How do I size limit which files are downloaded, like say I don't want any files bigger than 2 MB?
6
The only limitation option I know which wget
supports is the -Q
switch for quota. This is not what you want though, as it will stop after a combined limit of all files you've downloaded, not individually. Piping each link to it seperately with the -Q
switch won't work either, as explained in the man page.
I don't know what environment you're using, but crawler supports file size limitations with max-length-bytes and runs on the Java platform.
from their user manual:
- max-length-bytes
Maximum number of bytes to download per document. Will truncate file once this limit is reached.
By default this value is set to an extremely large value (in the exabyte range) that will never be reached in practice.
3
If its about "downloading 2MB max" rather than "download files with max 2MB" you could just limit the output saved to disk.
wget -O - $url |head -c 1024
(with an optional > $SaveAsFile
)
-> saves the first KB and the rest gets truncated.
(enough to see a "OK:$Message", not killing my /tmp with tons of error messages from the remote ;-))
1
This possible with help of 3rd-party patches: http://yurichev.com/wget.html
@KronoS there's an "edit" button right there if you think the answer needs to be expanded. Personally it seems fine as-is, given that sentence #1 of the linked page explains the new option… – supervacuo – 2015-06-27T19:39:57.333
Hmmm. Okay. That reiterated a lot of what I found out but good answer anyway. I didn't know that Heritrix truncated files instead of skipping them, though. – Nathaniel – 2010-03-18T22:28:47.203