The kernel itself can't be "in the middle" of a "move 1000 files" operation. You need to be much more specific about what operation you're proposing.
One thread can only move one file at a time with the rename(*oldpath, const char *newpath)
or renameat
system calls (and only within the same filesystem1). Or Linux renameat2
which has flags like RENAME_EXCHANGE
to atomically exchange two pathnames, or RENAME_NOREPLACE
to not replace the destination if it exists. (e.g. allowing a mv -i
implementation that avoids the race condition of stat
and then rename
, which would still overwrite a file created after stat
.
link
+ unlink
could also solve that, because link
fails if the new name exists.)
But each of these system calls only renames a single directory entry per system call. Using POSIX renameat
with olddirfd
and newdirfd
(opened with open(O_DIRECTORY)
) would allow you to keep looping over files in a directory even if the source or destination directory itself had been renamed. (Using relative paths could also allow that with regular rename()
.)
Anyway, as the other answers say, most programs that use the rename system call will figure out a list of filenames before doing the first rename
. (Usually using the readdir(3)
POSIX library function as a wrapper for platform-specific system calls like Linux getdents
).
But if you're talking about find -exec ... {} \;
to run one command per file, or the more efficient -exec {} +
with so many files that they don't fit on one command line, then you can certainly have renames happening while still scanning. e.g.
find . -name '*.txt' -exec mv -t ../txtfiles {} \; # Intentionally inefficient
If you created some new .txt
files while this was running, you might see some of them in ../txtfiles
. But internally find(1)
will have used open(O_DIRECTORY)
and getdents
on .
.
If one system call was enough to return all the directory entries in .
(which find will loop over one at a time, only making further system calls if needed for -type
or to recurse, or fork+exec on a match), then the list is a snapshot of the directory entries at one point in time. Further changes to the directory can't affect what find
does, because it already has a copy of the directory listing what it will loop over. (Probably it internally uses readdir(3)
, which returns one entry at a time, but inside glibc we know from using strace find .
that it makes a getdents64
system call with a buffer size of count=32768
entries.)
But if the directory is huge and/or the kernel doesn't fill find
's buffer, it will have to make a 2nd getdents system call after looping over what it got the first time. So it could maybe see new entries after doing some renames.
But see discussion in comments under other answers: the kernel might have snapshotted for us, because (I think) getdents isn't allowed to return the same filename twice. Different filesystems use different sorting / indexing mechanisms for making access to an entry in a huge directory more efficient than a linear search. So adding or removing a directory might possibly have other effects on the order of the remaining entries. Hmm, probably it's more likely that filesystems keep a stable order, and just update an actual index (like the EXT4 dir_index
feature), so a directory FD's position can just be a directory entry to resume from? I really don't know how the telldir(3)
library interface maps onto lseek
, or if that's purely a user-space thing for looping over the buffer obtained by user-space. But multiple getdents
can be needed to get all the entries from a huge directory, so even if seeking isn't supported, the kernel needs to be able to record a current position.
Footnote 1:
To "move" between filesystems, it's up to user-space to copy and unlink. (e.g. with open
and either read+write
, mmap+write
or sendfile(2)
or copy_file_range(2)
, the latter two totally avoiding bouncing the file data through user-space.)
7
This is not a direct answer, which seems to be well provided by @Eugene-Rieck. But, you might find it interesting/userful to read about Race Conditions (https://en.wikipedia.org/wiki/Race_condition ). They seem to be relevant to your question. In effect, if the specific commands you use to do the moving and adding of files create a race condition, then unusual things will happen.
– user02814 – 2019-02-27T05:13:01.3534@user02814: The problem with race conditions is that unusual things might happen. When you're looking for them or writing tests, they usually don't happen. When you're putting code in production, they will surely happen. :) – Eric Duminil – 2019-02-27T12:58:35.813
1As an anecdotal case, I was moving a directory (
mv dir/ other/
) during which I added files to it. At the end of the move the directory was deleted and the uncopied files disappeared with it. – The Vee – 2019-03-01T06:33:11.910To my above comment: across filesystems, that is. – The Vee – 2019-03-01T07:00:22.677