Truncate file in a pipe

4

1

Is there a simple way to truncate a file in a pipe? Specifically, I want to chop of the last four bytes of a file before feeding it into another process.

Ideally, I'd be able to write something like:

cat input.txt | some-process | truncate --size=-4 | another-process > output.txt

but it seems that the truncate command only operates "in place" on a file on disk.

kostmo

Posted 2012-02-24T04:29:14.383

Reputation: 303

1cat input.txt | some-process is better written as some_process < input.txt. – Benoit – 2012-02-24T09:53:29.703

Answers

5

I feel silly after writing that Python script.

There is a built-in shell command head to do it:

cat input.txt | some-process | head --bytes=-4 | another-process > output.txt

Edit: The GNU head command has a conceptually similar implementation (i.e. memory-efficient) to my Python implementation below. One difference is that it rounds up the size of the circular buffer (N, the number of elided bytes) to a multiple of some standard buffer size.

kostmo

Posted 2012-02-24T04:29:14.383

Reputation: 303

provided head version is not too old that will work. – Benoit – 2012-02-24T10:16:12.203

6

This is like if I told you to raise your hand as soon as I utter the fourth from last word I am about to say. I am not going to tell you beforehand how many words I am about to say.

A pipe is a stream. Its data has no size, it only has operations for getting the next element from it and / or inserting an element into it, and the result is either a piece of data or a signal that there is no more data.

So unless you first retrieve all the data from the stream, place it into a buffer, count its length, "rewind" the stream, and then retrieve four fewer elements, it can't be done.

EDIT: I need to do more thinking things through instead of coming up with clever analogies:) A stream does not say "stop me immediately n elements before the last", but rather "transmit all elements except the last n", and by maintaining a buffer of just n elements, and waiting until the first n elements have been received before transmitting the first one, it is possible. Obviously this won't work in situations like telecommunications where you want data to be sent immediately after being received as you could if you wanted the first n elements. And I assume truncate doesn't do it this way.

(attempted downvoting self -1)

Paul Richter

Posted 2012-02-24T04:29:14.383

Reputation: 299

You have my upvote, because your answer did help me realize that file/stream length is not known at runtime, which prompted me to write the Python script. – kostmo – 2012-02-24T21:28:29.860

1

sed can operate on the last line. This assumes the last 4 chars are on a single line:

printf "%s\n" abcdef ghijkl mnopqr | sed '$s/....$//'

outputs

abcdef
ghijkl
mn

glenn jackman

Posted 2012-02-24T04:29:14.383

Reputation: 18 546

0

I'm surprised noone's mentioned dd yet.

This will read the first 1024 bytes of input:

$ dd if=inputfile of=truncated_file count=1024

This will skip the first 2048 bytes of input:

$ dd if=inputfile of=truncated_file skip=2048

By removing the if and/or of parameter(s), dd will read from STDIN and write to STDOUT. That means you can do stuff like this:

$ cat input.txt | dd count=1024 | another-process > output.txt

Depending on what version of dd you are running, you can specify size units for the count and skip parameters (see the man page for more details).

kchr

Posted 2012-02-24T04:29:14.383

Reputation: 1

0

I couldn't find any built-in shell commands to do this, so I guess that means there's no "one-liner" solution. However, I was able to write a Python script to do what I need:

#!/usr/bin/env python
'''
Usage:
pipetruncate.py <N>

Truncates a stream in a pipe at N bytes before the EOF.
Uses memory proportional to N.
'''

import sys

buffer_length = int(sys.argv[1])
circular_buffer = [0]*buffer_length
count = 0
while True:
    ch = sys.stdin.read(1)
    if not len(ch): # EOF
        break

    index = count % buffer_length
    nextchar = circular_buffer[index]
    circular_buffer[index] = ch

    count += 1
    if count > buffer_length:
        sys.stdout.write(nextchar)

sys.stdout.close()

Then I invoke

cat input.txt | some-process | ./pipetruncate.py 4 | another-process > output.txt

kostmo

Posted 2012-02-24T04:29:14.383

Reputation: 303

0

Spent part of the morning writing a python script too. Of course, you better use your "head" instead of writing more code. Anyway here is my version. It is ugly but I think it is my first python script ever:

#!/usr/bin/python

# stream_trunc: cut the last n bits of a stream

import sys

if len(sys.argv) <> 2:
    print 'Usage: ' + sys.argv[0] + ' <number>'
    exit(1)

num = sys.argv[1]

if num.isdigit() != True:
    print 'Argument should be a number'
    print 'Usage: ' + sys.argv[0] + ' <number>'
    exit(1)

n = int(num)
buf = sys.stdin.read(n)
c = sys.stdin.read(1)

while c != '':
    sys.stdout.write(buf[0])
    buf = buf[1:] + c
    c = sys.stdin.read(1)

Jorge Juan

Posted 2012-02-24T04:29:14.383

Reputation: 1