How to search for a pattern between lines 1500 to 2500?

I have 8 files, and each contains around 2000 lines. I want to search the particular word in these files between line number 1500 to 2500.

The output should look like:

sample_1.txt :

1510:declare var testing


sample_2.txt :

1610:declare var testing


sample_7.txt :

1610:declare var testing


sample_10.txt :

1710:declare var testing

Is it possible to use grep for this task?

bharanikumar

Posted 2011-01-25T07:43:03.870

Reputation: 23

Answers

Try this:

#!/usr/bin/awk -f
BEGIN {
    begin = ARGV[1]
    end = ARGV[2]
    pattern=ARGV[3]
    ARGV[1] = ARGV[2] = ARGV[3] = ""
}

NR > end {exit}

NR == 1 {
    print FILENAME " :\n"
}

NR >= begin {
    if ($0 ~ pattern) 
        print NR ":" $0
}

Call it like this:

./rangegrep 1500 2000 'declare var testing' sample*.txt

The search string can be a regular expression.

Edit:

I changed from range checking the line number to using exit as in akira's answer since the exit will stop processing lines at the end of the range and save time by not reading the rest of the lines in the file.

Paused until further notice.

Posted 2011-01-25T07:43:03.870

Reputation: 86 075

is it possible with using grap instead of program, – bharanikumar – 2011-01-25T08:43:36.673

@Dennis Williamson: 'print FILENAME' could be put into the BEGIN { } as well... @bharanikumar: why are you so obsessed with grep? use the right tool for the problem and do not try to use the one hammer you have found somewhere and now everything you see is nails. – akira – 2011-01-25T09:21:49.307

1@akira: From man gawk: "FILENAME is undefined inside the BEGIN block (unless set by getline)." @bharanikumar: You could do it with grep if you used head and tail, but you'd lose line numbering and filename labeling. – Paused until further notice. – 2011-01-25T15:43:24.743

Though it's more efficient to write this as one script I can't help but thinking that this is a missing tool. As a complement to head and tail, a tool that gives you a specific range of lines or bytes out of the middle of a stream (based on offset from the start). We could call it torso or thorax. – phogg – 2011-01-25T15:56:23.890

@phogg: I like "torso"! However, "thorax" excludes the abdomen and we can't do that because it's the "guts" of the file that we're really interested in. Having a tool like "torso" probably violates at least one principle in the Unix philosphy since head|tail does the job adequately.

– Paused until further notice. – 2011-01-25T16:22:27.970

Combining head and tail with a pipe does this quite neatly, but the problem is retaining the line numbers is tricky. Sed can also do this quite simply, but it can't print the filename. – Flexo – 2011-01-25T16:59:23.990

@Alan: Also, I find line-number output in sed to be unsatisfactory. – Paused until further notice. – 2011-01-25T17:48:46.673

It's a shame you can't use it in a pattern match or substitution. – Flexo – 2011-01-25T18:09:01.977

You're all quite right, but just for fun I added an answer using torso. I sidestepped the line number question by pretending that it doesn't exist. – phogg – 2011-01-25T19:37:52.903

awk does what you want:

% awk 'NR < 1500 { next }; NR > 2500 { exit}; \
    /pattern/ { printf("%s:\n%d:%s\n", FILENAME, NR, $0); }' \
    sample_*.txt

to have as much space as you provided in your desired output, you just have to add as many \n to the printf statement...

akira

Posted 2011-01-25T07:43:03.870

Reputation: 52 754

I upvoted your answer earlier, but forgot to leave this comment. I like your next and exit better than my range checking. The exit can speed things up quite a bit under some circumstances. I'm going to blatantly copy those into my answer. OK, fine, with attribution. ;) – Paused until further notice. – 2011-01-25T15:47:23.520

Without using awk how about some shell script + sed:

for f in sample_*.txt ; do echo "$f : " ; \
    sed -ne '1500,2500{/pattern/{=;p}}' $f ; \
    echo ; \
done

Flexo

Posted 2011-01-25T07:43:03.870

Reputation: 1 897

Purely In The Interests Of Science, I present an implementation of torso, the logical middle between head and tail.

In practice, as others have noted, this is really unnecessary since you can get the desired output yourself by a trivial combination of head and tail.

#!/bin/sh

usage () {
    printf "$0: $0 [-c <byte> -C <byte>] [-n <line> -N <line>] file [file ... ]\n"
}

while [ $# -gt 0 ] ; do
    case "$1" in
            -c|--byte-start) shift ; start="$1" ; mode=byte ; shift ;;
            -C|--byte-end) shift ; end="$1" ; mode=byte ; shift ;;
            -n|--line-start) shift ; start="$1" ; mode=line ; shift ;;
            -N|--line-end) shift ; end="$1" ; mode=line ; shift ;;
                --) shift ;;
            -*) printf "bad option '%s'\n" "$1" ; usage ; exit 201 ;;
                *) files=("${files[@]}" "$1") ; shift ;;
         esac
done

if [ $start -gt $end ] ; then
    printf "end point cannot be before start point\n"
    usage
    exit 202
fi

head_cmd=
tail_cmd=
end=$((end - start))
if [ $mode = "line" ] ; then
    head_cmd="-n $end"
    tail_cmd="-n +$start"
elif [ $mode = "byte" ] ; then
    head_cmd="-c $end"
    tail_cmd="-c +$start"
fi

if [ ${#files[@]} -eq 0 ] ; then
    cat - | tail $tail_cmd | head $head_cmd
else
    tail $tail_cmd "${files[@]}" | head $head_cmd
fi

To keep it topical, here's how to use torso to solve the question:

torso -n 1500 -N 2500 input_file | grep -n "test"

Or for output conforming to the requirements

for file in sample_{1,2,7,10} ; do
     printf "\n\n%s:\n\n" "$file"
     torso -n 1500 -N 2500 "$file" | grep -n "test"
done

You may begin your criticisms... now!

phogg

Posted 2011-01-25T07:43:03.870

Reputation: 899

You could put cat -n in there somewhere to get line numbering. – Paused until further notice. – 2011-01-25T20:10:22.933

I worry about the possibility of false positives in the grep match if I do that. Probably it should be an additional switch. What "line number" means when there are multiple files is also a question. – phogg – 2011-01-25T22:59:20.527

@Dennis would have made sure I saw your comment. I just happened upon it. – Paused until further notice. – 2011-01-26T01:32:27.450