I have a poller running on a certain directory every 35s. The files are placed in this directory through a SFTP server. The problem is whenever the polling conflicts with the time when a file is being copied. It picks the incomplete file also which is not yet copied completely.
Can we know the status of a file whether it is in copying mode or copied mode?
There are several common strategies for file watchers to "know" a file is completely transferred
Poll with time interval, and treat the file to be completely transferred if file size is not changing within an interval. e.g. watch for file existence every 1 minute. Once you see the file exists, monitor its size for every 5 seconds. If file size stays constant for 30 seconds, then treat it as completely transferred.
Have the transfer process create a tagging file after file transfer. e.g. After it completed transferring the file FOO.txt
, create an empty FOO.txt.tag
. Your file watcher is going to check for existence of FOO.txt.tag
and once it exists, you know FOO.txt
has been completely transferred
In some special cases that the file is having special format (e.g. a special footer line) then your file watcher can poll the file and see the last lines, and see if they match with the desired pattern
Each method has its pros and cons:
- Method 1 affects the transfer process least. Sometimes files are transferred by 3rd party that you have almost no way to tell them to create the tag file as in method 2. However you can tell this method is not 100% reliable, especially under poor network.
- Method 2 is the most reliable. However, as said before, there are cases that you have no control on the transfer process
- Method 3 is only applicable to special cases
Choose the one that suit your need
Have the poller note file sizes. If the size did not change from one round to the next, the file is done downloading.
Can you influence the SFTP server? Can it create a marker file once the download is complete (e.g. '.thisIsAFile.doc.done
')?