Calculate how much disk-space would have been used

25

5

Is there on Linux a program that can calculate how much data a program would produce?

For example, if I would like to take backup of my MySQL database I would usually do

mysqldump > dumpfile.sql

Instead I would like to redirect to /dev/null but calculate how much disk-space would have been used, like

mysqldump | fancy_space_calc_program

Output:

123456789 Bytes would have been used

Note, the MySQL backup is just an example. I'm very well aware of how I could estimate the size before-hand, so please no comments about that.

fancyPants

Posted 2017-05-29T13:37:36.117

Reputation: 353

1I don't even think you can really make one; for specific cases yes, but not general usage, because how you can estimate if some app calls some server and downloads data from there - no chance you can estimate such things in foreign apps. So this would be per app - as you write that you already know for MYSQL - no explanation there, but other apps - per app, no general tool could do such prediction correctly. – Drako – 2017-05-29T13:43:46.813

1I hope you realize that any attempt to make the estimate would require to actually run the program and observe the output while it is send somewhere safe. This is going to be impossible if the program has some sort of irreversible effect on something else so you can ONLY run it once without unintended side-effects. The other problem is that if the program derives its output from a changing input the next run is going to create another (different size) output-file. Last but not least: diskspace <> (bytes of output). And various filesystems have different overheads for bookkeeping. – Tonny – 2017-05-29T18:52:41.420

1Yes, I'm well aware of that. It's still good enough for me. – fancyPants – 2017-05-29T19:55:51.230

@Drako You can have a general way of measuring the text output of a program. That does not need to be per app (see e.g. the accepted answer). Whether or not the text output will be reliably identical on subsequent runs is app-specific, but that does not prevent you from measuring the output in a general way. Presumably the OP and anyone else trying to measure output would only do so if the data was meaningful for any given application. – Jon Bentley – 2017-05-30T12:59:34.530

@JonBentley I never told you can not have it, read more carefully: "as I wrote general prediction is not going to be precise or even close :)" and now imagine that my app after running will check for updates of itself, of plugins, etc and will download x amount of data from i-net and store that on your hdd; how you are going to precisely measure in advance with general tool not knowing anything about my app, how much storage will be needed after running it? Still you can do your best guess with accepted answer and in many cases even be pretty precise. – Drako – 2017-05-30T14:53:07.717

@drako Yes you did, you wrote "I don't even think you can really make one - for specific cases yes, but not general usage". You can have one for general usage, and it will perform its job even for applications whose output varies. It's your job as the user to make sure you're using it on an application where the data you get will be meaningful. No reasonable person would use the tool on something which will output random data that it got from the internet - that's obvious. – Jon Bentley – 2017-05-30T18:49:39.467

Answers

37

Taken from https://stackoverflow.com/questions/13418688/use-pipe-with-du-to-compute-size-of-stdin

You can pipe it to wc -c to count the number of bytes that goes through the pipeline.

Of course, this is just the raw bytes, and have nothing to do with sector size etc, so take it with a grain of salt...

Magnus

Posted 2017-05-29T13:37:36.117

Reputation: 1 548

as I wrote general prediction is not going to be precise or even close :) – Drako – 2017-05-29T13:44:49.653

At which point does this run out of memory? Is wc buffered or unbuffered / will it try to buffer all of stdin into memory before counting? – cat – 2017-05-29T15:39:11.777

6@cat a good implementation of wc will discard data it no longer needs as soon as practical. – Ruslan – 2017-05-29T15:54:52.773

2@cat I think it's unlikely to be buffered, since you don't need buffering to count lines or characters. GNU coreutils wc on my computer easily handles 40 GB stdin data, with only 8 GB memory. – Frxstrem – 2017-05-29T15:55:51.577

8@Magnus. I think you missed the wordplay. WC is a British term for what Americans call a bathroom. You're piping the unused data into the WC. – Fund Monica's Lawsuit – 2017-05-30T09:26:01.060

3@Frxstrem You certainly do need buffering to count lines or characters - as soon as you're no longer working with an isomorphic encoding. Since POSIX.2, wc -c doesn't count characters - it counts bytes. wc -m counts characters. The most obvious difference is in multi-byte characters like in UTF-16 or the Windows \r\n (two bytes in ASCII, but one character). It doesn't necessarily need a lot of buffering most of the time, but Unicode can have an arbitrary amount of bytes to represent a single character; not something you'd see in trusted data, but a possible buffer overflow vector. – Luaan – 2017-05-30T11:37:15.900

@cat I will be buffering. But the buffers will only be a few KB. – kasperd – 2017-05-30T22:25:29.143

28

The command pv is perfect for this.

mysqldump | pv -b > /dev/null

I think the above will give you the right command you want, it may need some adjusting such as pv -b | > /dev/null as I can't test right now

-b gives you a value in bytes.

djsmiley2k TMW

Posted 2017-05-29T13:37:36.117

Reputation: 5 937

1Holy, I forgot about pv as well as wc. Shame on me. I'd like to accept both answers. So, sorry, but Magnus was a little faster and he can use the reputation. – fancyPants – 2017-05-29T13:49:47.270

Yeah no worries, the wc trick is real nice, not sure why that didn't immediately occur to me tbh. I first went 'bar!' then realised what I meant was pv! :) – djsmiley2k TMW – 2017-05-29T13:57:11.187

And now you've got me wondering about grabbing the file handle, and checking for a size in /proc somewhere.... – djsmiley2k TMW – 2017-05-29T13:58:17.047

2I've never heard of pv before.. You learn something new every day :) – Magnus – 2017-05-29T13:58:36.600

2

@Magnus : I think wc is older (part of some older Unix systems), not in as much documentation, and (quite possibly as a result) pv is pre-installed in fewer distributions. Still, nice to know about. See this conceptually beautiful picture which comes from the "pv" ("pipe viewer") program's home page

– TOOGAM – 2017-05-29T18:41:21.590

0

You can use dd for it, like this cat /dev/zero | dd status=progress of=/dev/null bs=4M.

This provides you with some data during and after the execution about the amount of data passed to it, like:

$ cat /dev/zero | dd status=progress of=/dev/null                                                                                                                              
5371334656 bytes (5.4 GB, 5.0 GiB) copied, 4 s, 1.3 GB/s^C # this is progress data
12271136+0 records in #summary
12271135+0 records out #summary
6282821120 bytes (6.3 GB, 5.9 GiB) copied, 4.66683 s, 1.3 GB/s #summary

styrofoam fly

Posted 2017-05-29T13:37:36.117

Reputation: 1 746