11

Let's say you have data with quantities in human-readable format, such as the output of du -h, and want to further operate on those numbers. Let's say you want to pipe your data through grep to do a summation of a sub-set of that data. You do this ad-hoc on many systems you've never seen before, and have only minimal utilities. You want suffix conversions for all the standard 10^n suffixes.

Exists a gnu-linux utility to convert the suffixed numbers to real numbers within a pipeline? Do you have a bash function written to do this, or some perl which might be easy to remember, instead of a length of regex replacements or several sed steps?

38M     /var/crazyface/courses/200909-90147
2.7M    /var/crazyface/courses/200909-90157
1.1M    /var/crazyface/courses/200909-90159
385M    /var/crazyface/courses/200909-90161
1.3M    /var/crazyface/courses/200909-90169
376M    /var/crazyface/courses/200907-90171
8.0K    /var/crazyface/courses/200907-90173
668K    /var/crazyface/courses/200907-90175
564M    /var/crazyface/courses/200907-90178
4.0K    /var/crazyface/courses/200907-90179

| grep 200907 | <amazing suffix conversion> | awk '{s+=$1} END {print s}'


Relevant references:

beans
  • 1,550
  • 13
  • 16
  • 2
    You rarely need to use grep and awk. If you are using awk, then use awk. Just add `/200907/` in front of your per-line code, e.g. `awk '/200907/{s+=$1} END {print s}'` – Tony Sep 08 '15 at 19:21

4 Answers4

15

Based on my answer at one of the questions you linked to:

awk '{
    ex = index("KMGTPEZY", substr($1, length($1)))
    val = substr($1, 0, length($1) - 1)

    prod = val * 10^(ex * 3)

    sum += prod
}
END {print sum}'

Another method that's used:

sed 's/G/ * 1000 M/;s/M/ * 1000 K/;s/K/ * 1000/; s/$/ +\\/; $a0' | bc
Dennis Williamson
  • 60,515
  • 14
  • 113
  • 148
4

You can use perl regular expressions to do this. For example,

$value = 0;
if($line =~ /(\d+\.?\d*)(\D+)\s+/) {
   $amplifier = 1024 if ($2 eq 'K');
   $amplifier = 1024 * 1024 if ($2 eq 'M');
   $amplifier = 1024 * 1024 * 1024 if ($2 eq 'G');
   $value = $1 * $amplifier;
}

This is a simple script. You can consider it as starting point. Hope it will help!

Khaled
  • 35,688
  • 8
  • 69
  • 98
  • Indeed, this is one way. I've also found http://stackoverflow.com/questions/2557649/convert-memory-size-human-readable-into-actual-number-bytes-in-perl. – beans Feb 18 '11 at 19:22
4

Personally, I'd just not use the -h flag in the first place. The "human readable" version rounds off numbers which will need to be rounded again when you convert back, getting even less accurate. (For instance, 2.7MiB is 2831155.2 bytes. What did you do with the other 0.8th of a byte??!)

Otherwise, you can ask units to convert MiB/GiB/KiB to just "B" and it'll handle this, but you'd have to do something like (assuming your output is tabbed, otherwise cut appropriately)

{your output} | cut -f1 '-d{tab}' | xargs -L 1 -I {} units -1t {}iB B | awk '{s+=$1}END{printf "%d\n",s}'
DerfK
  • 19,313
  • 2
  • 35
  • 51
  • 1
    Well noted, that there is a loss of precision. Supplementing the input to units also works.. but I found `units` missing on my minimal distro! I think we'd all do this differently if we had full control of everything. – beans Feb 18 '11 at 19:44
3
VALUE=$1

for i in "g G m M k K"; do
        VALUE=${VALUE//[gG]/*1024m}
        VALUE=${VALUE//[mM]/*1024k}
        VALUE=${VALUE//[kK]/*1024}
done

[ ${VALUE//\*/} -gt 0 ] && echo VALUE=$((VALUE)) || echo "ERROR: size invalid, pls enter correct size"
Michael Hampton
  • 237,123
  • 42
  • 477
  • 940
Sundeep471
  • 131
  • 3