How to extract two numbers from two strings and calculate the difference in Bash?

1

I have a text file which contains (among others) the following lines:

{chapter}{{1}Einleitung}{27}{chapter.1}  
{chapter}{{2}Grundlagen}{35}{chapter.2}

How can I

  • get the 2 lines from this text file (they will always contain }Einleitung resp. }Grundlagen} and
  • extract the 2 page numbers (in this case 27 and 35),
  • calculate the difference 35-27 = 8 and
  • save the difference (8) of the two numbers in a variable

Perhaps with a bash script in Mac OS X?

MostlyHarmless

Posted 2011-12-12T11:31:21.013

Reputation: 1 708

var=$({ grep -Eo '(Einleitung|Grundlagen)\}.[0-9]+.'|sort -r|tr '\n' ' '| tr -d -c '0-9 '|awk '{print $1 - $2}'; }</tmp/inputfile) – artistoex – 2011-12-12T12:52:43.957

Answers

3

I do not know if Mac OS X has awk. If it does, this should work:

This should work:

DIFFERENZ=$(awk 'BEGIN {
  FS="[{}]+"
 } {
  if ($4=="Einleitung")
   EINLEITUNG=$5
  if ($4=="Grundlagen")
   GRUNDLAGEN=$5
 } END {
   print GRUNDLAGEN-EINLEITUNG
 }' textfile)

How it works:

  • FS="[{}]+" sets the field separator to any combination of curly brackets.
  • $4 refers to the third filed on the line (separated by curly brackets).
  • DIFFERENZ=$(...) evaluates the command ... and stores the ouput in DIFFERENZ.

Dennis

Posted 2011-12-12T11:31:21.013

Reputation: 42 934

thanks, that works well with my example. How do I have to write a chapter title which contains a space like Ergebnisse und Diskussion? I tried with if ($3=="Ergebnisse und Diskussion"), but that does not seem to find the correct line – MostlyHarmless – 2011-12-12T12:24:09.427

@Martin: Spaces are treated as separators. if ($3=="Ergebnisse" && $4=="und" && $5=="Diskussion") should work. But the page number will no longer be stored in $4. I'll update my answer. – Dennis – 2011-12-12T12:29:56.493

thank you for your help - sorry, I should have directly asked for the more complicated string, but I did not think about this possible complication – MostlyHarmless – 2011-12-12T12:32:14.090

The problem was my gsub-hack. I have set the field separator properly, so if ($4=="Ergebnisse und Diskussion") should work now. – Dennis – 2011-12-12T12:52:42.363

1@Dennis: and now your answer looks like mine :) – akira – 2011-12-12T13:41:00.850

3

calc.awk:

BEGIN {
    FS="}{";           # split lines by '}{'
    e=0;               # set variable 'e' to 0
    g=0;               # set variable 'g' to 0
}

/Einleitung/ { e=$3; } # 'Einleitung' matches, extract the page
/Grundlagen/ { g=$3;}  # 'Grundlagen' matches, extract the page

END {
    print g-e;         # print difference
}

you can call it via:

$> awk -f calc.awk < in.txt

it will print 8. you could store that number in a bash-variable like this:

$> nr=`awk -f calc.awk < in.txt` 

if you need it more tight you could also rewrite calc.awk to be not a separate file but a one-line:

$> nr=`awk 'BEGIN{FS="}{";g=0;e=0}/Einleitung/{e=$3;}/Grundlagen/{g=$3;}END{print g-e;}' < in.txt`

akira

Posted 2011-12-12T11:31:21.013

Reputation: 52 754

1

Pure bash 4.x, and shows the differences for every chapter:

unset page_last title_last page_cur title_cur
re='\{chapter\}\{\{[[:digit:]]+\}([^}]+)\}\{([[:digit:]]+)\}'
while read -r line; do
    if [[ $line =~ $re ]]; then
        title_cur=${BASH_REMATCH[1]} page_cur=${BASH_REMATCH[2]}
        diff=$((page_cur-page_last))
        echo "${diff} pages between \"${title_last}\" and \"${title_cur}\""
        title_last=$title_cur page_last=$page_cur
    fi
done < "$myfile"

user1686

Posted 2011-12-12T11:31:21.013

Reputation: 283 655

0

$ DIFFERENCE=$(( $( cat FILENAME | grep Grundlagen | head -n1 | cut -c26-27 ) - $( cat FILENAME | grep Einleitung  | head -n1 | cut -c26-27 ) ))
$ echo $DIFFERENCE
8

This requires that the lines always look exactly like this (i.e. no different headline), because of the cut.

Daniel Beck

Posted 2011-12-12T11:31:21.013

Reputation: 98 421

1it wont even work with different numbers, lets say 1 or 100 – akira – 2011-12-12T12:03:06.763

@akira If there are that many pages between introduction and fundamentals chapter headlines, he's doing something wrong :-) But you're right of course. – Daniel Beck – 2011-12-12T12:10:12.603

@DanielBeck: Thank you for your anwer! As you already state (and @akira says), the usage of this solution is quite limited because the numbers have to be exactly at the same position each time. The solutions with awk are more flexible. – MostlyHarmless – 2011-12-12T12:28:40.160

@Martin While you're right, you never even hinted that e.g. you want to apply a solution to other chapter names. Quite the opposite with your first list item... – Daniel Beck – 2011-12-12T13:28:46.090

@DanielBeck: this is true - my question was incomplete. – MostlyHarmless – 2011-12-12T15:12:52.420