Should I worry about race conditions with SFTP transfer/processing?

1

1

Scenario:

I'm using SFTP to automatically transfer files between two systems, A and B.

System A runs an SFTP server. System A will periodically (say once a minute) poll its local SFTP directory for the existence of *.dat files, and if found, import and delete them.

System B generates *.dat files, and as they are generated, sends them to System A by connecting to that SFTP host and uploading them.

Questions:

  1. Is it possible System A will see and begin processing a file before System B has finished uploading it? Or will SFTP prevent this somehow, such as not depositing files in the folder until the network transfer finishes?

  2. Is it reasonable/recommended for System B to upload under another filename such as *.locked or *.part, then rename to *.dat after the network transfer is complete? Or is there a better way of handling this?

Sam Jones

Posted 2019-06-28T14:45:10.303

Reputation: 203

Please clarify which client/OS you are using. – slhck – 2019-06-28T15:05:50.050

Answers

1

It's not a race condition by definition, however it is possible that a file, partially uploaded, is opened for read by System A and hence contains invalid data. System A could check the file for consistency, could test for a fixed size if appropriate, could test for certain file permissions (which you set after upload) and in any case defer opening the file in the event that the conditions aren't met, doing so on the next iteration.

I'd upload to a temporary filename or location and then rename/move to the correct folder/extension for your program. i.e. upload to filename.part and then rename to filename.dat or upload to pending/filename.dat and then move out of the pending folder. That will solve any such issues. On UNIX/Linux and Windows systems, a move operation (or rename operation) will be atomic and you'll never get a partial file.

There is no real better way of handling this. You are needing to communicate to System A that the file is not complete and don't have any inter-process communication set up between the systems. Your options are to use a lock file to prevent your program from opening the file (and later remove it), use a temporary file (and then rename/move your file to the appropriate name), or do some sort of integrity checking (which is probably a waste of resources).

You could also consider, depending on how often this is taking place, triggering System A from System B. If there is a file there 99% of the time, the way you propose already (with a lock) is probably most efficient. On the other hand, if you will only occasionally find data, it may be a waste of resources (and require a long-running or cron-triggered program). If you have SFTP, you may have SSH access. In such case, set up certificates between the system (to avoid the need for passwords, see ssh-copy-id) and run some modified version of

ssh system_a.yourdomain.com 'processfile /home/user/data/*.dat'

michaelkrieger

Posted 2019-06-28T14:45:10.303

Reputation: 159

That's indeed the only practical solution. See also SFTP file lock mechanism.

– Martin Prikryl – 2019-07-01T07:36:55.433