Identifying files that changed in the last hour

4

Right now I use python to figure out files that have been modified in the past hour. This is really slow on my network (~50000 files - and checking each one's timestamp). I have a custom script to which I pass this list - it does some transfers/logging in/etc on a remote server.

I want to speed up the file list generation time (it takes ~15-20mins to just figure out the file list). Any suggestions?

One thing that might be helpful is that the network has a netapp filer. That filer creates those .snapshot dirs. Can I somehow hook into the filer (through the API?) and figure out a list that changed recently?

If you're not familiar with the NetApp API, suggestions on how to generate a file list (fast!) with unix commands would be great! (BTW, this is a network filesystem - so there will be multiple machines making changes).

Utkarsh Sinha

Posted 2013-07-02T13:46:35.950

Reputation: 1 327

Answers

4

Just use find:

find /path/to/dir/ -mmin -61

Relevant options:

   -mmin n
          File's data was last modified n minutes ago.
   Numeric arguments can be specified as

   +n     for greater than n,

   -n     for less than n,

   n      for exactly n.

So, -mmin -61 means "find files that have been modified less than 61 minutes ago", in other words, those that have been modified in the last hour.

You might want to use these options as well, they will speed up the search but I don't know if they are appropriate, that will depend on your setup:

  • -maxdepth 1 : Don't descend into subdirectories.
  • -type f : Look only for regular files, no directories etc.

I just ran this command on my laptop (i7,2.6GHz) in a directory containing 78353 randomly generated files (meaning the modification dates are also random). It took less than one second to return a list of 51 files modified in the last hour.

terdon

Posted 2013-07-02T13:46:35.950

Reputation: 45 216

How long does it take if you run it again in hour, after Windows has purged the file info from disk cache? – Darth Android – 2013-07-02T14:31:48.910

@DarthAndroid Windows? This is on Linux, what does Windows have to do with it? I tried it again an hour or so later and it took 0.169 as opposed to 0.143 seconds. – terdon – 2013-07-02T15:08:45.720

Sorry, my brain hasn't woken up yet, though the same question still applied to linux. – Darth Android – 2013-07-02T15:27:12.170

Running 'find' locally is fine - I tried this on the NFS around here - but it took quite a while - ~5-7 minutes. – Utkarsh Sinha – 2013-07-03T11:20:59.620

@UtkarshSinha ah,OK, slow NFS. What exactly are you looking in? Are you searching multiple folders across various filesystems? Is this all in one folder? Do the files have anything in common? Will they be smaller/larger than Xkb, or all have the same extension or the same permission modes or anything like that? Basically, I doubt anything will be faster than find, especially if the problem is the network speed, so if you could refine the find that might speed things up. – terdon – 2013-07-03T12:45:49.393