1

I need a bash script that takes the output of a shell command and parses that output to pull out the id, and website url for each line in the table that can then be used to execute additional bash commands.

Here's an example of the command output.

+----+-------------------------------+----------------------------------------+---------+
| id | name                          | url                                    | version |
+----+-------------------------------+----------------------------------------+---------+
| 25 | example.com                   | http://www.example.com/                | 3.8     |
| 34 | anotherexample.com            | https://anotherexample.com/            | 3.2     |
| 62 | yetanotherexample.com         | https://yetanotherexample.com/         | 3.9     |
+----+-------------------------------+----------------------------------------+---------+

Pseudo code for the script would be along the lines of:

$output = `command --list'
for each row in $output {
    $siteid=extracted_id
    $url=extracted_url

    $process_result = 'new_command $siteid'
    log "$siteid, $url, $process_result" > log.txt
endif

Note that the numeric id could be more than 2 digits.

Is anyone able to give me a starting point on how to parse each line of the original output command and pull the id and url as variables while ignoring the first 3 lines and last line that are the table border and header?

I can figure the rest out, it's just parsing each line that I'm stuck on.

Any suggestions / advice would be greatly appreciated.

Thanks in advance.

Phill Coxon
  • 33
  • 1
  • 1
  • 4
  • 1
    That looks like a `mysql` command output. Is it actually a `mysql` command output? If so, there are command options you can give to remove the ASCII art and column headers, which will make it easier to parse. – Michael Hampton Oct 26 '18 at 18:27
  • Hi @MichaelHampton - no, it isn't mysql output. :) – Phill Coxon Oct 27 '18 at 05:28

3 Answers3

2

Welcome Phill Coxon,

Method 1

This pure bash script seem to fit your needs

#!/usr/bin/env bash
declare id
declare name
declare url
declare version

while read line; do
  if [[ ! ${line} =~ ^[\+\| ]]; then
    if [[ ${line} =~ \|[[:space:]]*([[:digit:]]+)[[:space:]]*\|[[:space:]]+([[:alnum:]\.]+)[[:space:]]+\|[[:space:]]+(https?:\/\/(www\.)?[[:alnum:]]+\.[[:alpha:]]+\/?)[[:space:]]*\|[[:space:]]*([[:digit:]](\.[[:digit:]])?)[[:space:]]*\|  ]]; then
      id="${BASH_REMATCH[1]}"
      name="${BASH_REMATCH[2]}"
      url="${BASH_REMATCH[3]}"
      version="${BASH_REMATCH[5]}"
      echo "${id}:${name}:${url}:${version}"
    fi
  fi
done

Method 2

You can too create a bash function and use it in your script as follow

#!/usr/bin/env bash
parse_result(){
  local id
  local name
  local url
  local version

  while read line; do
    if [[ ! ${line} =~ ^[\+\| ]]; then
      if [[ ${line} =~ \|[[:space:]]*([[:digit:]]+)[[:space:]]*\|[[:space:]]+([[:alnum:]\.]+)[[:space:]]+\|[[:space:]]+(https?:\/\/(www\.)?[[:alnum:]]+\.[[:alpha:]]+\/?)[[:space:]]*\|[[:space:]]*([[:digit:]](\.[[:digit:]])?)[[:space:]]*\|  ]]; then
        id="${BASH_REMATCH[1]}"
        name="${BASH_REMATCH[2]}"
        url="${BASH_REMATCH[3]}"
        version="${BASH_REMATCH[5]}"
        echo "${id}:${name}:${url}:${version}"
      fi
    fi
  done
}

parse_result < <(cat cmd.out)

Here I use process substitution but you can use pipe

Result and discussion

As example cmd.out is the command output to parse. In your case you have to replace cat cmd.out by your command

result 1:

$ cat cmd.out | ./app.bash
25:example.com:http://www.example.com/:3.8
34:anotherexample.com:https://anotherexample.com/:3.2
62:yetanotherexample.com:https://yetanotherexample.com/:3.9

result 2:

$ bash app2.bash
25:example.com:http://www.example.com/:3.8
34:anotherexample.com:https://anotherexample.com/:3.2
62:yetanotherexample.com:https://yetanotherexample.com/:3.9
bioinfornatics
  • 156
  • 1
  • 7
2

Thank you so much @bioinfornatics and @jeff Schaller - I'm very appreciative of the level of detail you each provided.

I used both of your answers for my solution shown below where list_command generates the table output and process_command runs against each website id. I've tested it and it's working perfectly - I just need to add the logging and I'm done.

Thank you both so much!

#!/usr/bin/env bash
parse_result(){
  local id
  local name
  local url
  local version

  while read line; do

          # pull the id, name and url as variables starting from 4th line and ignoring lines starting with +---

          awk -F'|' ' NR > 3 && !/^+--/ { print $2, $3, $4, $5 } ' | while read id name url version

          do
            RESULT="$(process_command $id)"
            echo "result: $RESULT";
            echo "id: $id | name: $name | url: $url | version: $version";
          done
  done
}
parse_result < <(list_command)
Phill Coxon
  • 33
  • 1
  • 1
  • 4
1

While you can carefully parse text with bash, sometimes it's easier to rely on a dedicated text-processing tool, such as awk:

awk -F'|' ' NR > 3 && !/^+--/ { print $2, $3, $4} ' > log.txt

This tells awk to split up lines into fields based on a separator of |; the program code inside the single-quotes breaks down as:

  • NR > 3 && -- if the number of records (lines) processed so far is greater than 3 and ...
  • !/^+--/ -- ... and if the line does not start with +--
  • ... then print fields 2, 3, and 4

... all eventually redirected to the log.txt file.

Jeff Schaller
  • 519
  • 6
  • 17