Parse by Control Characters

3

0

I am trying to parse the output of a command that expects to be writing to the screen. It has data separated by move-to-origin control sequences (for the VT220, ESC[1;1H). I only need the last part (i.e. after the last move-to-origin).

I have tried doing this multiple ways (primarily awk and sed), but the problem is always that parts of the control sequence have special meaning (to the program, not just to the shell), and I cannot quote them when I substitute tput's output.

Any suggestions?

EDIT:

Here is an example of what I am looking to parse (ESC is the escape character):

Page 1; line 1
Page 1; line 2
ESC[1;1HPage 2; line 1
Page 2; line 2
ESC[1;1HPage 3; line 1
Page 3; line 2

I am looking to get the following, which is what would be on the terminal after the program has run.

Page 3; line 1
Page 3; line 2

user130039

Posted 2012-11-11T15:57:37.370

Reputation:

Could you give an example of the actual text you are trying to parse? – terdon – 2012-12-08T16:58:34.373

This sort of thing would be easiest to do in a scripting language. – dangph – 2012-12-09T05:39:16.773

I agree. My difficulty is in getting the control sequence into the script without the script trying to interpret it. – None – 2012-12-09T15:17:50.547

@bugmenot, what platform are you on? I could knock out a script for you in Python or PowerShell in two minutes. – dangph – 2012-12-09T22:06:51.320

@bugmenot, to put an escape character in your source code, you typically put "\x1b". Or "\e" for regular expressions. But it depends on the particular language. – dangph – 2012-12-09T22:14:01.490

I know how to escape characters. The problem is that I need to get the string from another program (tput), so I cannot simply put it in the script literally. – None – 2012-12-09T23:16:35.350

@bugmenot, I'm not sure what you mean. You are not being very clear. – dangph – 2012-12-10T00:31:34.413

1Which part do you not understand? Hopefully, this will clear it up: I am trying to get the last "page" of data, where a page is the last part after a control sequence. However, I cannot hard-code the control sequence because it can vary depending on the terminal on which it is run. I need to get it from tput, but when I do command substitution, the escape character is not properly quoted. – None – 2012-12-10T00:40:13.773

Answers

0

the easiest way to handle your problem:

  1. you convert all the control seq stuff by 'uuencode -m v' or 'xxd'
  2. now you can text process as usual by awk
  3. after all you back convert by 'uudecode -o -' or 'xxd -revert'

Alternatively use: bbe - a sed-like editor for binary files

if you need a more specific answer I need a more specific program sample

sparkie

Posted 2012-11-11T15:57:37.370

Reputation: 2 110

yeah, it's simple when you know where to search for it:-) I added a few things above... – sparkie – 2013-01-05T16:47:45.910