Mass find / replace in file using python

2

I have a file from editing in wich I need to change some stuff. My knowledge of python is very basic. It would save me hours of copy/pasting if I would find a solution.

My file contains this:

002  AS       V     C        01:00:24:14 01:00:28:18 01:00:35:01 01:00:39:05 * FROM CLIP NAME: Sq3_Sh1.jpg
003  AS       V     C        01:00:39:05 01:00:42:23 01:00:39:05 01:00:42:23 * FROM CLIP NAME: Sq3_Sh4.jpg
004  AS       V     C        01:00:42:23 01:00:45:16 01:00:42:23 01:00:45:16 * FROM CLIP NAME: Sq3_Sh5.jpg
005  BA       V     C        00:00:00:00 00:00:05:20 01:00:45:16 01:00:51:12
006  AS       V     C        01:00:24:14 01:00:29:06 01:00:51:12 01:00:56:04 * FROM CLIP NAME: Sq3_Sh14.jpg
007  AS       V     C        01:00:56:04 01:00:59:10 01:00:56:04 01:00:59:10 * FROM CLIP NAME: Sq3_Sh6.jpg

I need to do 2 things:

  1. Replace every AS with the from clip name. For example Sq3_Sh6 (without the extension)

  2. Delete every line of text that contains BA

Maybe somebody could help?

Fabian ESH

Posted 2015-08-25T19:35:37.783

Reputation: 31

Does it need to be python? Are you on Linux, where sed and awk are present? – bertieb – 2015-08-25T19:39:18.113

Sorry on windows... preferable python. Cause I know it a bit. Same for batch/cmd. – Fabian ESH – 2015-08-25T19:46:51.100

Can work with that :) – bertieb – 2015-08-25T19:58:24.810

Answers

1

Things are quite easy with a good regular expression utility. Certainly python can handle this, but JREPL.BAT can provide an even simpler solution. It is a pure script based utility (hybrid JScript/batch) that runs natively on any Windows machine. Simply copy the script into a folder that is listed within your PATH.

I'm assuming each file name is <= 8 characters in length, and you want to preserve the existing column alignment on each line of output.

My solutions below assume you want to overwrite the original file, call it test.txt, and you have JREPL.BAT in a folder listed in your PATH.

If each line is either an AS line that should be modified and preserved, or a BA line that should be dropped, then all you need is the following (I used line continuation ^ just to make the code easier to read):

call jrepl "^(...  )AS      (.*FROM CLIP NAME: (.*?)\..*)$"^
           "$1+($3+'        ').slice(0,8)+$2"^
           /jmatch /f test.txt /o -

If your input includes additional lines that aren't AS or BA that should be preserved, then you could use:

call jrepl "^(...  )AS      (.*FROM CLIP NAME: (.*?)\..*)$|^...  (?!BA).*$"^
           "$2+($4+'        ').slice(0,8)+$3|$0"^
           /t "|" /jmatch /f test.txt /o -

Full documentation is embedded within JREPL.BAT.

dbenham

Posted 2015-08-25T19:35:37.783

Reputation: 9 212

0

Replacing parts of a line, using Python, under Windows

How I long for a quick bash one-liner1. However, this script should do what you want by using a simple regex to extract the filename, then using that in the string.replace() function.

Script:

#!/usr/bin/env python3
import sys, fileinput, re

if __name__ == "__main__" and len(sys.argv) > 1:
    rx = re.compile("(?:FROM CLIP NAME\:\ )([\w]+)\.jpg")
    for line in fileinput.input([sys.argv[1]], inplace=True, backup=".bak"):
            if not "BA" in line:
                    if "AS" in line:
                            m = rx.search(line)
                            if m:
                                    print(line.replace("AS", m.group(1)))
                    else:
                            print(line)

Save as substitution.py (or whatever), and run as:

 D:\project\images\>python substitution.py datafile.dat

where datafile.dat is the actual file you want the script to operate on. The script will create a backup of the original file named datafile.dat.bak.

Caveats and assumptions

This assumes that your lines end with FROM CLIP NAME: <filename.jpg>. It assumes that lines containing BA are to be omitted; and ignores any line that does not contain "AS".

Written with python3 in mind but works with python2; tested 2.7.10 under Cygwin, 2.7.9 and 3.4.2 under Linux. Likely highly fragile. Back up file before use. Do not get script in eyes. if ingested, seek medical assistance. May cause cancer in the state of California.

1 Or even and "in-place replace" using re.sub() sigh

bertieb

Posted 2015-08-25T19:35:37.783

Reputation: 6 181

0

You can consider the following piece of code, just a simple demo, did not handle any possible error.

with open('test.txt', 'rb') as f1:
    with open('result.txt', 'a') as f2:
        for line in f1:
            if 'BA' not in line:
                cols = line.split()
                clip = line.split(':')[-1].split('.')[0].strip()
                if cols[1] == 'AS':
                    cols[1] = clip
                f2.write('{0}\n'.format('\t'.join(cols)))

Fei Yuan

Posted 2015-08-25T19:35:37.783

Reputation: 1