A cronjob runs every 3 hours to download a file using SFTP. The scheduled program is written in Perl and the module used is Net::SFTP::Foreign
.
Can the Net::SFTP::Foreign
download files that are only partially uploaded using SFTP?
If so, do we need to check the SFTP file modified date to check copy process completion?
Suppose a new file is uploading by someone in SFTP and he file upload/copy is in progress. If a download is attempted at the same time, do I need to code for the possibility of fetching only part of a file?
It's not a question of the SFTP client you use, that's irrelevant. It's how the SFTP server handles the situation.
Some SFTP servers may lock the file being uploaded, preventing you from accessing it, while it is still being uploaded. But most SFTP servers, particularly the common OpenSSH SFTP server, won't lock the file.
There's no generic solution to this problem. Checking for timestamp or size changes may work for you, but it's hardly reliable.
There are some common workarounds to the problem:
- Have the uploader upload "done" file once upload finishes. Make your program wait for the "done" file to appear.
- You can have dedicated "upload" folder and have the uploader (atomically) move the uploaded file to "done" folder. Make your program look to the "done" folder only.
- Have a file naming convention for files being uploaded (".filepart") and have the uploader (atomically) rename the file after upload to its final name. Make your program ignore the ".filepart" files.
See (my) article Locking files while uploading / Upload to temporary file name for example of implementing this approach.
- A gross hack is to periodically check for file attributes (size and time) and consider the upload finished, if the attributes have not changed for some time interval.
- You can also make use of the fact that some file formats have clear end-of-the-file marker (like XML or ZIP). So you know, when you download an incomplete file.
For details, see my answer to SFTP file lock mechanism.
The easiest way to do that when the upload process is also under your control, is to upload files using temporal names (for instance, foo-20170809.tgz.temp
) and once the upload finishes, rename then (Net::SFTP::Foreign::put
method supports the atomic
option which does just that). Then on the download side, filter out the files with names corresponding to temporal files.
Anyway, Net::SFTP::Foreign
get
and rget
methods can be instructed to resume a transfer passing the option resume => 1
.
Also, if you have full SSH access to the SFTP server, you could check if some other process is still writing to the file to be downloaded using fuser
or some similar tool (though, note that even then, the file may be incomplete if for instance there is some network issue and the uploader needs to reconnect before resuming the transfer).
You can check the size of the file.
- Connect to SFTP.
- Check file size.
- Sleep for 5/10 seconds.
- Check file size again.
- If size did not change, download the file, if the size changed do step 3.