How to clean up output of linux 'script' command

36

28

I'm using the linux 'script' command http://www.linuxcommand.org/man_pages/script1.html to track some interactive sessions. The output files from that contain unprintable characters, including my backspace keystrokes.

Is there a way to tidy these output files up so they only contain what was displayed on screen?

Or is there another way to record an interactive shell session (input and output)?

Andrew

Posted 2011-01-24T00:49:37.153

Reputation: 482

"Or is there another way to record an interactive shell session (input and output)?" Do you know https://asciinema.org/ ?

– masterxilo – 2018-05-03T15:55:42.263

Answers

35

If you want to view the file, then you can send the output through col -bp; this interprets the control characters. Then you can pipe through less, if you like.

col -bp typescript | less -R

On some systems col wouldn't accept a filename argument, use this syntax instead:

col -bp <typescript | less -R

Arcege

Posted 2011-01-24T00:49:37.153

Reputation: 1 883

Doesn't work for me, scrambles some of the output. – Alex – 2016-12-23T23:53:35.737

1On my system less -R by itself provides better output than piping through col -bp first. – Brian Hawkins – 2017-04-25T22:13:02.920

@BrianHawkins I concur. Using col -bp <typescript | less -R does not display the colorized console. Using less -R typescript does display the colorized console! – Trevor Boyd Smith – 2017-05-31T12:51:44.040

this is only good if you want to view the script interactively in less. – Trevor Boyd Smith – 2018-10-24T15:57:22.330

1on my system, col wouldn't accept a filename, so I did col -bp < typescript and got what I wanted. – Andrew – 2012-03-19T12:19:29.863

18

cat typescript | perl -pe 's/\e([^\[\]]|\[.*?[a-zA-Z]|\].*?\a)//g' | col -b > typescript-processed

here's some interpretation of the string input to perl:

  • s/pattern//g means to do a substitution on the entire (the g option means do the entire thing instead of stopping on the first substitute) input string

here's some interpretation of the regex pattern:

  • \e match the special "escape" control character (ASCII 0x1A)
  • ( and ) are the beginning and end of a group
  • | means the group can match one of N patterns. where the N patterns are
    • [^\[\]] or
    • \[.*?[a-zA-Z] or
    • \].*?\a
  • [^\[\]] means
    • match a set of NOT characters where the not characters are [ and ]
  • \[.*?[a-zA-Z] means
    • match a string starting with [ then do a non-greedy .*? until the first alpha character
  • \].*?\a means
    • match a string that starts with ] then do a non-greedy .*? until you hit the special control character called "the alert (bell) character"

Peter Nore

Posted 2011-01-24T00:49:37.153

Reputation: 1 642

1I still need to figure out how, but this really works ;) – asdmin – 2016-02-10T06:21:29.820

@asdmin - Basically, this echoes the output of the typescript to a perl program that removes certain control characters from the output, then pipes the output to the unix col command, whose -b option removes any "delete" key artifacts in the transcript. It then pipes the output to a text file. – Peter Nore – 2016-02-10T09:57:17.717

This scrambles the output in the first line of the typescript for me but is the best answer. – Alex – 2016-12-23T23:58:10.630

This seems to work very well with some typescripts; it's certainly more readable than the output produced by the accepted answer. – fakedad – 2017-10-23T02:38:35.860

legendary answer! – zack – 2018-04-20T17:35:19.753

magically works for me. i'll try to include some explanation but i'm no perl guru. – Trevor Boyd Smith – 2018-10-24T15:17:59.830

2

For a large quantity of script output, I'd hack a perl script together iteratively. Otherwise hand edit with a good editor.

There is unlikely to be an existing automated method of removing control characters from script output in a way that reproduces what was displayed on the screen at certain important moments (such as when the host was waiting for that first character of some user input).

For example the screen might be blank except for Andrew $, if you then typed rm /* and pressed backspace twelve times (far more than needed), what gets shown on the screen at the end of that depends on what shell was running, what your current stty settings are (which you might change partway through a session) and probably some other factors too.

The above applies to any automated method of continuously capturing input and output. The main alternative is taking "screen shots" or cutting and pasting the screen at appropriate times during the session (which is what I do for user guides, notes for a day-log, etc).

RedGrittyBrick

Posted 2011-01-24T00:49:37.153

Reputation: 70 632

2

An answer to the second part of my question is to use the logging facility in gnu screen: ^A H from within a running screen session. The documentation is at http://www.gnu.org/software/screen/manual/screen.html#Logging

Andrew

Posted 2011-01-24T00:49:37.153

Reputation: 482

2

If what you're after is to record your commands (e.g. to later turn them into a bash script), then a reasonable hack is to run script(1), then inside it run

bash -x

Afterwards grep the output file (usually "typescript") looking for lines starting with a "+". The regular expression ^\+ will do the trick.

Yaron

Posted 2011-01-24T00:49:37.153

Reputation: 21

2

I used cat filename which removes control characters :-)

Peeyush

Posted 2011-01-24T00:49:37.153

Reputation: 223

imo this is a nicer answer, since it really removes all the control characters. – Nathanael Farley – 2014-09-23T11:02:36.477

on OSX, cat does not remove colour control characters... – Nick – 2014-12-10T18:04:06.590

9Actually cat doesn't remove the control characters at all, rather it outputs them verbatim, and the terminal then interprets them. That might work for you if your typescript is short relative to your terminal buffer and you can just copy and paste from the terminal. Not so good if your typescript is large though. – mc0e – 2016-06-05T11:51:35.810

1Agreed. This doesn't remove anything. It simply allows the shell to interpret them. They are still present. – Kentgrav – 2017-08-24T16:30:11.150

2

If you want to write the output to a file:

col -bp < typescript >>newfile

use unix2dos command to convert file to Windows format if you want

amara

Posted 2011-01-24T00:49:37.153

Reputation: 29

1On Ubuntu 14.04, that leaves in a lot of junk at the start and end of lines. Quite readable, but not really clean. – mc0e – 2016-06-05T11:54:03.360

2

col -bp processes the backspaces as desired (AFAIK). But it mangles the color escape sequences. It might be good to remove the color sequences first, then process the backspaces, if possible.

This is a very common need, and I'm surprised there are not more solutions to it. It is extremely common to script a session, then somebody has a need to review the procedure. You want to cut out all the little typing mistakes, and color escape sequences to create a "clean" script of the procedure for future reference. Simple ASCII text preferred. I think this is what is intended by "human readable", and it is a very reasonable thing to do.

Aaron

Posted 2011-01-24T00:49:37.153

Reputation: 21

1

https://github.com/RadixSeven/typescript2txt was written to solve this problem.

It's been 4 years since I last updated/used it, but I don't remember doing anything fancy that shouldn't still work today.

Eponymous

Posted 2011-01-24T00:49:37.153

Reputation: 111

1

I found the answer that dewtall provided to a similar question on the Unix board to be more effective at removing control characters from the output of script if you are in an environment where Perl is available to you.

dewtall's script:

#!/usr/bin/perl
while (<>) {
    s/ \e[ #%()*+\-.\/]. |
       \r | # Remove extra carriage returns also
       (?:\e\[|\x9b) [ -?]* [@-~] | # CSI ... Cmd
       (?:\e\]|\x9d) .*? (?:\e\\|[\a\x9c]) | # OSC ... (ST|BEL)
       (?:\e[P^_]|[\x90\x9e\x9f]) .*? (?:\e\\|\x9c) | # (DCS|PM|APC) ... ST
       \e.|[\x80-\x9f] //xg;
       1 while s/[^\b][\b]//g;  # remove all non-backspace followed by backspace
    print;
}

To remove the control characters:

./dewtalls-script.pl < output-from-script-that-needs-control-characters-removed

rynemccall

Posted 2011-01-24T00:49:37.153

Reputation: 111

0

I found a good way to do it. On my system, long output lines are sprinkled with " ^M" (blank space followed by carriage return). The "^M" can be nicely replaced with the null character "^@", which does not display at all when you cat the file.

I capture timing too, so in order to replay the file perfectly, I cannot simply remove " ^M" completely using the commands below (because scriptreplay counts bytes):

tr '\r' '\0' | sed 's/ \x0//g'

I run my script command like this:

script -t -f session.log 2>timing

So, what I do afterwards is:

cat session.log | tr '\r' '\0' > typescript 
scriptreplay -t timing | sed 's/ \x0//g'

The first edit (before replay) retains the number of bytes in the file. The second edit (after the replay) gets rid of white space in random places. (Note that by default scriptreplay looks for input file named "typescript", which is why I did not provide it after "timing".)

Khanan

Posted 2011-01-24T00:49:37.153

Reputation: 1

-1

dos2unix on the output will also do the trick

albert

Posted 2011-01-24T00:49:37.153

Reputation: 1

7Could you explain how to use it to accomplish the task? – Ben N – 2016-01-05T23:05:13.423

-1

One other solution is to use strings which prints only printable characters from a file (or from standard input):

strings -n 1 filename

The -n 1 option sets the minimum length of the sequences to be preserved to one and thus makes sure even single printable characters surrounded by non-printable characters are preserved.

One possible downside of this approach is that strings adds line breaks between contiguous strings of printable characters. For instance a file with content

Foo<SOMECONTROLCHAR>Bar

(where <SOMECONTROLCHAR> is control character or any other non-printable character) would be returned as

Foo
Bar

Another issue brought up in the comments is that some sequences of control characters consist of a combination of both printable and non-printable characters and this approach would only remove part of those.

However, strings does a good job of removing control characters like the backspace mentioned in the question.

justfortherec

Posted 2011-01-24T00:49:37.153

Reputation: 115

strings does not remove all non-printable characters. It identifies and prints sequences of printable characters. That is not the same thing. – a CVn – 2016-04-01T15:13:59.337

@MichaelKjörling, you're right, by default strings only prints sequences of a minimum length of 4. I've corrected my answer by adding the -n 1 option which sets the minimum length to 1. Thanks for pointing this out. – justfortherec – 2016-04-02T21:30:34.707

The answer still makes the same claim that strings removes all non-printable characters, so it is still wrong in the same way it was before the edit. It's also obviously broken because "some color code" (and control codes in general) often consist of both printable and non-printable characters. For example, a control code sequence to change the text color might be ESC[01;52m where ESC is the single escape character (byte value 27). Using strings as you suggest would leave [01;52m in the output, which is meaningless. – a CVn – 2016-04-03T12:28:35.443

Good point, @MichaelKjörling. Especially the example with the color code was very unfortunate. Thanks for helping me to improve my answer. Do the edits address your concerns appropriately? strings might not do the same job as some of the other answers but IMHO it is a valid approach to solve the problem described in the question. – justfortherec – 2016-04-04T20:30:13.783