Substitute large numbers of files according to style guide

2

0

I'd like to do a wholesale reformatting of our tests, and I'm cleaning up some inconsistent capitalization. I'm thinking of using awk to do this, since sed falls a little short, and since I need lookahead for my case. Specifically, for each line in a given file, I want the following to happen:

  • Look for the word it, describe, or context, followed by a space, followed by a single or double quote mark, followed by an uppercase alphabetic character.

  • If there's a match, substitute the match with the lowercased version of the entire matched string, but only the matched string (don't lowercase other things on the same line).

  • Don't match if the string after the single or double quote mark is one of "GET", "POST", "PUT", or "DELETE".

So, for example:

describe 'apple banana'       ----> (no change)
describe 'APPLE BANANA'       ----> describe 'aPPLE BANANA'
describe 'Apple Banana'       ----> describe 'apple Banana'
describe "Apple Banana"       ----> describe "apple Banana"
describe 'one TWO'            ----> (no change)

context 'POST BANANA'         ----> (no change)
context 'XPOST BANANA'        ----> context 'xPOST BANANA'

What awk arguments and/or other commands should I use? (It's okay with me if it requires more than one pass to do it.)

John Feminella

Posted 2012-10-01T16:35:13.400

Reputation: 1 582

Will it, describe, or context always be the first word on the line?  Will there always be exactly two words between the quotes? – Scott – 2012-10-01T19:54:58.900

@Scott (1) it, describe, and context will always be the first word, but they may not start that line. (2) There might be many words between the quotes. – John Feminella – 2012-10-02T02:21:15.953

@JohnFeminella Do you want me to update my answer so that it takes these scenarios into account as well? Thanks – p_strand – 2012-10-02T17:57:16.130

Answers

3


DISCLAIMER:

This solution will strip all "extra" whitespaces on the lines that are replaced. For example...:

      describe           'Apple           Banana'

...will be replaced with:

 describe 'apple Banana'

However, the "extra" whitespaces in...:

            context      "GET  BANANA"

...will not be removed.


Here's an example in awk as requested (please note that you can execute the command on one line. The line breaks are only for visual appeal here on Super User):

cat someTextFile.txt | awk '{ \
    if( \
        ($1=="describe" || $1=="it" || $1=="context") \
         && (substr($2,0,1)=="\"" || substr($2,0,1)=="'"'"'") \
         && !(substr($2,2,length($2)-1)=="POST" \
              || substr($2,2,length($2)-1)=="GET" \
              || substr($2,2,length($2)-1)=="PUT" \
              || substr($2,2,length($2)-1)=="DELETE") \
       ){ \
          subStr=substr($2,0,1); \
          subStr2=tolower(substr($2,2,1)); \
          restStr=substr($2,3,length($2)-1); \
          print $1" "subStr""subStr2""restStr" "$3 \
        }else{ \
          print \
        } \
     }' 

OUTPUT:

 describe 'apple banana'
 describe 'aPPLE BANANA'
 describe 'apple Banana'
 describe "apple Banana"
 describe 'one TWO'

 context 'POST BANANA'
 context 'xPOST BANANA'

EDIT: here's the command without the line breaks cat someTextFile.txt | awk '{ if( ($1=="describe" || $1=="it" || $1=="context") && (substr($2,0,1)=="\"" || substr($2,0,1)=="'"'"'") && !(substr($2,2,length($2)-1)=="POST" || substr($2,2,length($2)-1)=="GET" || substr($2,2,length($2)-1)=="PUT" || substr($2,2,length($2)-1)=="DELETE") ){ subStr=substr($2,0,1); subStr2=tolower(substr($2,2,1)); restStr=substr($2,3,length($2)-1); print $1" "subStr""subStr2""restStr" "$3}else{print}}'

If you wish to write the output to a new file, just add > output.txt at the end of the command.

If you wish to execute this on multiple files and store the result in one textfile, simply swap out cat someFile.txt with a cat command that executes on the files that you want to be formatted e.g. with cat *log* or cat $(find /some/path -name "*log*")

EDIT EDIT: Thanks a lot Scott for the feedback!

p_strand

Posted 2012-10-01T16:35:13.400

Reputation: 679

Goodness! I wound up doing this in a Ruby oneliner instead of awk, but this solution does work. High-five! – John Feminella – 2012-10-02T02:23:13.063

Will you please post the Ruby oneliner as an alternative solution? – Clayton Stanley – 2012-10-02T02:40:25.107

The multi-line version needs \ characters at the ends of the lines that are interior to the if statement. – Scott – 2012-10-02T15:59:13.503

John says, “it, describe, and context … may not start [the] line.”  I’m not sure exactly what he means by that, but it raises the point that this solution will strip white space at the beginning of the input line (before the keyword). – Scott – 2012-10-02T16:02:10.283

John says, “There might be many words between the quotes.”  If this solution is given the input text “describe 'The quick brown fox'”, it will yield “describe 'the quick”. – Scott – 2012-10-02T16:02:42.093

In the find example, the filename pattern should be quoted: find /some/path -name "*log*". – Scott – 2012-10-02T16:04:17.410

>

  • Yes, you're right. Like I said, I only added the line breaks for visual appeal. 2. I'm not sure either. He added that as a comment after I had posted my answer. I'm not sure if I should update my answer now though since he accepted it... Once again, he added that as a comment after I had posted my answer 3. You're right, should I edit my answer though since it's accepted? 4. find /some/path -name *log* works fine. It is more clear however, when you add the quotation marks.
  • < – p_strand – 2012-10-02T16:29:00.773

    @p_strand: Regarding #4: If there is one file in the current directory whose name contains “log” — say, “logical” — then “-name *log*” turns into “-name logical”.  If there are multiple files in the current directory whose names contain “log”, then “-name *log*” turns into “-name logical my_blog test.log”, which is a find syntax error.  And if there no files in the current directory whose names contain “log”, then “-name *log*” may or may not work depending on what shell you’re using, and perhaps what shell options you have set. – Scott – 2012-10-02T17:17:28.587

    @p_strand: By the way, when you respond to a comment (in a new comment), it’s conventional to mention the name of the author of the first comment, preceded by “@”, as in “@Scott”.  That way he gets notified.  See the Replying in comments paragraphs of the Comment formatting section of the Markdown Editing Help page.

    – Scott – 2012-10-02T17:18:32.597

    @Scott: Regarding #4: I'm not sure that I'm following. * is just a wildcard matcher - it doesn't expand to all of the files in that directory. I have tried this locally and it works fine. find /etc/httpd/ -name *con* returns a list of all of the files/directories under /etc/httpd that has a name that contains the string con. It doesn't give me a syntax error.

    Thanks for letting me know about the annotations! I've been wondering why I sometimes get notifications from comments and sometimes I don't... – p_strand – 2012-10-02T17:32:07.240

    @Scott Also, the reason why I'm asking whether or not I should edit my accepted answer is because I'm fairly new to Super User so I'm not sure about the policies regarding editing accepted answers. – p_strand – 2012-10-02T17:34:03.963

    1@p_strand: Regarding *: When you say cat *log* — in Unix/Linux, at least, and probably in any implementation of bash — the cat program doesn’t see the “*log*” argument.  It sees (in my second example) three arguments: “logical”, “my_blog”, and “test.log”, because the shell (bash, the C shell, or whatever you’re using) does the expansion.  In your example: do you have any file(s) matching *con* *in the current directory* when you run that find command? – Scott – 2012-10-02T18:41:31.313

    1@p_strand: Regarding editing an accepted answer, I’m not sure of the etiquette myself.  I’d say it’s OK as long as you broaden your answer; i.e., change it to something as good as what you posted originally, but that covers more cases.  Fixing it so it can handle more than two words between quotes is an example of that.  Adding a disclaimer that your solution strips white space is certainly valid, and I would encourage you to add the backslashes. – Scott – 2012-10-02T18:43:35.180

    @Scott Oh, I see what you mean now! Thanks a lot for clearing this up! I didn't know that it behaved that way... I'll add the quotation marks to my answer. – p_strand – 2012-10-02T18:46:41.150