Is FTP reliable for use with an automated data-exc

2019-07-02 12:45发布

问题:

I've been asked to design a batch application that would retrieve data (specifically, a detailed list of transactions) from an external vendor on a periodic basis. We have agreed to use XML for the data exchange, but are investigating different methods/protocols to facilitate the actual data transfer. The vendor suggested email or FTP as a means to transfer the data, but we rejected the first option out-right due to logistics and reliability concerns.

As for the second, FTP, I have always been hesitant to utilize FTP in a production environment where reliability is a concern. A design whereby a vendor publishes files to an FTP to be periodically pulled down seems unreliable and error prone. My initial reaction would be to gravitate towards something like a web service (which this particular vendor may or may not even be able or willing to provide), where the data could be queried, as needed, for a specific time period.

In general, what is the best approach to use in a situation such as this? Is FTP (or SFTP) generally considered to be an acceptable option, or is there something better? Is a web-service overkill for such a simple exchange of data? Are there other viable options that I am completely overlooking?

回答1:

File transfer presents a number of complications.

I would prefer a webservice, or just https access to the file with digest/basic auth, but for very large files, that may not be practical for them.

Another answer could be to use a shared bucket on amazon s3, where you have read access, and they have write access. I have used that a couple of times as a poor man's secure file transfer.

I have used flavors of FTP in this way, here are some tips if you do:

1) Use a secure version like sftp - ftp is just not secure for the credentials or data.

2) Use a semaphore file to indicate when the latest file is complete and available, or make sure that when they write the file to the FTP dir, they move it in whole, so you do not access incomplete files.

3) Make sure each file has a unique file name (timestamp, sequence number, etc.) so you can keep track of which you have processed and which you haven't. Do not reuse the file name, as you do not know when you have processed already, and could get a race condition of the file is updated as you are accessing it.

4) Use a hash value to check for successful transfer. They could provide an md5 hash for the file, and then you could check this against your version once you have completed copying it. I have often used the md5 file as a semaphore as well, to both indicate a file is available, and provide a means to check the transfer was complete and correct.



回答2:

You could use AS2.

However this is a push mechanism. as2 mendelson would be a free gateway software. You would set up a "channel" and everything would be transfered to you without any coding. If some problems pop up you should receive notifications.

FTP is pretty insecure. It should be reliable though.



回答3:

Well I'm coming late to the party, but for what it's worth I've implemented all of the above and so far AS2 (using mendelson) has been the easiest and least error prone.

My observations:

  • Implementing sftp/ftps is straight forward and is fairly reliable with a low barrier to entry, but you end up needing to write your own polling methods (as Andrew mentioned)
  • Web services are great, but only if the vendor properly designs and documents them. I've found that smaller partners tend to whip an API together and than break it when adding functionality or add information to the transfer based upon other customer requests, but fail to update documentation to reflect new functionality. In one case this precipitated us moving to sftp.
  • AS2 is nice as its secure and pretty low maintenance with mendelson. Add a directory watcher on the servers output folders and you end up with realtime1 processing.

Of course at the end of the day, your vendor is going to dictate how far they're willing to go with providing connection methods and you'll need to choose the best method that they provide.

1 realtime processing is not actually realtime processing, but a management acceptable approximation of. Your managers may differ from mine.