How to find multiple files in linux system

3

1

I have massive of files in my system and every file has one corresponded file name. For example,

test.pdf has a test-project.zip test2.pdf has a test2-project.zip

test.pdf and test2.pdf are the original files and test-project.zip and test2-project.zip are generated by my script.

I need to find out if all of my original files have the 'filename'-project.zip corresponded to the original file.

I can use

find /project/ -name "*.pdf" | wc -l
find /project/ -name "*-project.zip" | wc -l

to find out if the numbers match but I need to know which file has no corresponded file.

Can anyone help me about it? Thanks a lot!

FlyingCat

Posted 2013-05-24T19:10:33.423

Reputation: 187

Is it a single folder /project/, or are subdirectories involved? – Daniel Beck – 2013-05-24T19:13:41.083

subdirectories included.. – FlyingCat – 2013-05-24T19:15:28.037

is test.pdf and test-project.zip (and so on) always in the same directory? – evilsoup – 2013-05-24T19:20:55.330

yes they are all in the same dir – FlyingCat – 2013-05-24T19:21:23.217

Answers

5

Quicky script, adapt as you see fit:

#!/usr/bin/env bash

find /project/ -name '*.pdf' -print0 | while read -d $'\0' i; do
  if [ ! -e "${i/%.pdf/-project.zip}" ]; then
    echo "${i/%.pdf/-project.zip} doesn't exist!"
  fi
done

exit 0

-d $'\0' sets the delimiter for read to nullbyte, while -print0 is the equivalent for find, so this should be bulletproof against files with spaces and newlines in their names (obviously irrelevant in this case, but useful to know in general). ${i/%.pdf/-project.zip} replaces the .pdf at the end of the variable $i with -project.zip. Other than that, this is all standard shell scripting stuff.

If you wanted to shorten it even more, you could also use

[ -e "${i/%.pdf/-project.zip}" ] || echo "${i/%.pdf/-project.zip} doesn't exist!"

...instead of the if statement. I think that if is easier to work with if you're using more than a single, short line (you can get around this by using a function, but at that point you aren't getting any psace saving vs. using the if).

Assuming you have bash 4+ (you probably do; you can check with bash --version), you can use the globstar option instead of find:

#!/usr/bin/env bash

shopt -s globstar
for f in /project/**/*.pdf; do
  if [ ! -e "${f/%.pdf/-project.zip}" ]; then
    echo "${f/%.pdf/-project.zip} doesn't exist!"
  fi
done

exit 0

This has the advantage of being pure bash, so it should be faster (only noticeably so with at least hundreds of files, though).

evilsoup

Posted 2013-05-24T19:10:33.423

Reputation: 10 085

You guard against newlines, but the [ ... ] test would still break if $i had spaces. – user1686 – 2013-05-24T19:58:11.693

@grawity - I can't believe I forgot to quote the variable, fixed. – evilsoup – 2013-05-24T20:00:16.840

+1 for '-d '\0'... but do you need /bin/env bash? /bin/bash is a POSIX necessity. Do you really worry about which bash you hit? – Rich Homolka – 2013-05-24T20:07:49.100

@RichHomolka AFAIK there's no actual downside to using /usr/bin/env, and while it may not be necessary in this case, I look at it as a good habit to form for working with other laguages. – evilsoup – 2013-05-24T20:10:47.047

3@RichHomolka: /bin/sh is a POSIX necessity. bash, on the other hand, is a third-party program, and it's frequently in /usr/bin, /usr/local/bin, or even /usr/pkg/bin. – user1686 – 2013-05-24T20:19:13.660

@grawity maybe I'm just thinking SVR4, which required it. Thanks – Rich Homolka – 2013-05-24T20:31:58.307

@RichHomolka: bash didn't exist when SVR4 was released, afaik – user1686 – 2013-05-24T23:15:05.130

0

Here are two ways you could do it. One is a godawful Bash one-liner which spawns at least one, possibly two, processes for each file it matches:

[me@box] $ for file in `find -name '*.pdf' -exec perl -le'$f=shift(); $f =~ s@\.pdf$@@; print $f' {} \;`; do (TESTFILE="$file-project.zip"; if [ ! -f $TESTFILE ]; then echo "missing $TESTFILE"; fi); done

Since that's enough to make anyone's eyes bleed, here's a Perl script which does the same job, much more sanely than any Bash script ever could:

#!/usr/bin/env perl
use strict;

my $path = shift() || die "$0 requires a path argument\n";
my @files = `find "$path" -name '*.pdf'`;

foreach my $file (@files) {
  chomp $file;
  my $zip = $file;
  $zip =~ s@\.pdf$@-project.zip@;
  next if -f $zip;
  print "missing $zip\n";
};

Copy that into, e.g., 'find-missing.pl', then invoke find-missing.pl /project/.

Aaron Miller

Posted 2013-05-24T19:10:33.423

Reputation: 8 849

1for file in \find …` is a common anti-pattern and should be avoided. See: http://mywiki.wooledge.org/ParsingLs – slhck – 2013-05-25T09:50:44.040