Multithreading a Bash Script

0

I have a 8 GB txt file and I have to run a python script for each line in the file and save part of the output .

Is there any way I can split this into several processes to make it run faster, in bash ?

Here is my current bash script :

#!/bin/bash
filename='before.txt'
while read p; do 
    python py-hex.py $p | sed -n -e '/^qter/p' | sed 's/qter: //g' >> converted.txt
done < $filename

Liviu ZeJah

Posted 2015-10-06T21:49:02.867

Reputation: 35

Answers

2

I think you need to provide more detail on the limits - for example, does the output in converted.txt need to be in the same order as 'before.txt', how long does each parse of the python script take ? If the output order is not contingent on the input, you may be able to do this by backgrounding the processes and launching a number of them in each loop - the number depending, I guess, on how many threads your CPU will handle.

Something like the following might (or might not) suit your purpose:

#! /bin/bash
threads=4;

threads=$(( $threads - 1))
while read filein
do
    python py-hex.py $filein | sed -n -e '/^qter/p' | sed 's/qter: //g' >> converted.txt  &
    for thread in `seq $threads`
    do
         read filein          
         python py-hex.py $filein | sed -n -e '/^qter/p' | sed 's/qter: //g' >> converted.txt  &
     done
done < $filename

Notes: This assumes your python file can handle empty inputs (ie if the number of commands not exactly divisible by the number of threads there will be some empty lines - you could always do a check for this before executing the inner loop.

This script assumes you don't care about the output order.

davidgo

Posted 2015-10-06T21:49:02.867

Reputation: 49 152

yup, something like that . works great . still testing this but it seems to work a little faster . thanks for your help! – Liviu ZeJah – 2015-10-07T12:28:55.177