Invoking wget from java (runtime.exec) hangs and limits the download to a specific file size (while same with curl does not)

1

Issue details

I am trying to invoke wget from java to download a file but I keep hitting a weird issue where the file size will be capped.

For example, when issuing "wget https://speed.hetzner.de/1GB.bin", I correctly have 1GB.bin with a file size of 1,048,576,000 (exactly 1 GB). But when invoking the same command from java I consistently end up with approximately 40 MB file.

Debugging

Assuming you have JDK installed, here is an MCVE that reproduces this behavior:

echo 'class RunCommand {
    public static void main(String[] args) throws Exception {
        String s = "";
        for (int i=0; i < args.length; i++)
            s += (i > 0 ? " " : "") + args[i];
        System.out.println(Runtime.getRuntime().exec(s).waitFor());
    }
}' > RunCommand.java

javac RunCommand.java

java RunCommand wget https://speed.hetzner.de/1GB.bin

I have tried this on a clean AWS CentOS 7.6 machine with all of:

  • OpenJDK 7
  • OpenJDK 8
  • Oracle JDK 8

I always end up with the same result: java hangs and the file size is around 40 MB.

I have also tried increasing heap size with -Xms1024m -Xmx1024m to no avail, concluding that heap size is not the problem.

Now, running the exact same thing again with curl instead:

java RunCommand curl https://speed.hetzner.de/1GB.bin -o 1GB.bin

This surprisingly works and I successfully end up with a 1GB file!

Questions

So there are many questions here:

  1. Why is java hanging after 40 MB?
  2. Why always exactly 40 MB? (grepping 40 in -XX:+PrintFlagsFinal gives no clue)
  3. What difference is there between the wget and curl commands that could lead to one failing and the other succeeding?

Jbezos

Posted 2019-02-12T16:18:56.503

Reputation: 13

Answers

0

Try adding --quiet to the command; probably the standard output is full since you're not reading it through the input stream.

This is extracted from the wget's manual.

    -q
    --quiet
    Turn off Wget’s output.

Check the code snippet below.

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

/**
 * Created on 2/13/2019.
 *
 * @author Julien Saab
 */
public class TestCommand {
    public static void main(String[] args) throws Exception {

        final List<String> commands = new ArrayList<>(Arrays.asList(args));
        commands.add("--quiet");
        final Process process = new ProcessBuilder().command(commands).start();
        final int i = process.waitFor();

        System.out.printf("Process exited with code %1$s\n", i);
    }
}

I've tried the same file that you're using, it's getting way past 41 MBs (But of course I didn't fully download it).

java TestCommand wget https://speed.hetzner.de/1GB.bin

Julien Saab

Posted 2019-02-12T16:18:56.503

Reputation: 16

wget -q does the job.

So it seems that wget output (progress bar) is causing this issue. The question still remains, however, about what exactly in java is trying to read stdin and where the limit on this is coming from. It seems to be almost exactly 40m across different Linux distributions and different versions of Java which is quite intriguing to say the least. – Jbezos – 2019-02-13T15:26:01.843

Another question I guess is why this problem is not happening with curl since curl also has a progress output... – Jbezos – 2019-02-13T15:33:49.417

You can test the output to a file using wget https://speed.hetzner.de/1GB.bin 2>&1 | tee wget_output.log . The file grows larger than 1.5MB while doing the same thing with curl, the file remains under 30k. – Jbezos – 2019-02-22T18:18:04.817

I couldn't verify this but around 40M with wget, the file was about 64k. – Jbezos – 2019-02-22T18:19:38.663