A simple loop over a lot of files is half as fast on one system vs. the other.
using bash, I did something like
for * in ./
do
something here
done
Using "time" I was able to confirm, that on system2 the "something here"-part runs faster than on system1. Nevertheless, the whole loop on system 2 takes double as long as on system1. Why? ...and how can I speed this up?
There are about 20000 (text-)files in the directory. Reducing the number of files to about 6000 significantly speeds things up. These findings stay the same regardless of the looping-method (replacing "for * in" with a find command or even putting filenames in an array first).
System1: Debian (in an openvz-vm, using reiserfs)
System2: Ubuntu (native, faster Processor than System1, faster Raid5 too, using ext3 and ext4 - results stay the same)
So far I should have ruled out: hardware (System2 should be way faster), userland-software (bash, grep, awk, find are the same versions) and .bashrc (no spiffy config there).
So is it the filesystem? Can I tweak ext3/4 so that it gets as fast as reiserfs?
Thanks for your recommendations!
Edit: Ok, you're right, I should have provided more info. Now I have to reveal my beginner's bash mumble but here we go:
declare -a UIDS NAMES TEMPS ANGLEAS ANGLEBS
ELEM=0
for i in *html
do
#get UID
UID=${i%-*html}
UIDS[$ELEM]=$UID
# get Name
NAME=`awk -F, '/"name":"/ { lines[last] = $0 } END { print lines[last] }' ${i} | awk '{ print $2 }'`
NAME=${NAME##\[*\"}
NAMES[$ELEM]=$NAME
echo "getting values for ["$UID"]" "("$ELEM "of" $ELEMS")"
TEMPS[$ELEM]=`awk -F, '/Temperature/ { lines[last] = $0 } END { print lines[last] }' ${i} | sed 's/<[^>]*>//g' | tr -d [:punct:] | awk '{ print $3 }'`
ANGLEAS[$ELEM]=`awk -F, '/Angle A/ { lines[last] = $0 } END { print lines[last] }' ${i} | sed 's/<[^>]*>//g' | tr -d [:punct:] | awk '{ print $3 }'`
ANGLEBS[$ELEM]=`awk -F, '/Angle B/ { lines[last] = $0 } END { print lines[last] }' ${i} | sed 's/<[^>]*>//g' | tr -d [:punct:] | awk '{ print $3 }'`
### about 20 more lines like these ^^^
((ELEM++))
done
Yes, the problem is, that I have to read the file 20 times but putting the content of the file in a variable (FILE=(cat $i
)) removes the linebreaks and I can't use awk anymore...? Maybe I tried that wrong so if you have a suggestion for me, I'd be grateful.
Still: the problem remains, that reading a file in that directory just takes too long...
To the hardware-question: well, system1 runs on over 5 year-old hardware, system2 is 2 months old. Yes, the specs are quite different (other mainboards, processors etc.) but system2 is way faster in every other aspect and raw write/read rates to the filesystem are faster too.