I've been asked to design a batch application that would retrieve data (specifically, a detailed list of transactions) from an external vendor on a periodic basis. We have agreed to use XML for the data exchange, but are investigating different methods/protocols to facilitate the actual data transfer. The vendor suggested email or FTP as a means to transfer the data, but we rejected the first option out-right due to logistics and reliability concerns.
As for the second, FTP, I have always been hesitant to utilize FTP in a production environment where reliability is a concern. A design whereby a vendor publishes files to an FTP to be periodically pulled down seems unreliable and error prone. My initial reaction would be to gravitate towards something like a web service (which this particular vendor may or may not even be able or willing to provide), where the data could be queried, as needed, for a specific time period.
In general, what is the best approach to use in a situation such as this? Is FTP (or SFTP) generally considered to be an acceptable option, or is there something better? Is a web-service overkill for such a simple exchange of data? Are there other viable options that I am completely overlooking?
You could use AS2.
However this is a push mechanism. as2 mendelson would be a free gateway software. You would set up a "channel" and everything would be transfered to you without any coding. If some problems pop up you should receive notifications.
FTP is pretty insecure. It should be reliable though.
Well I'm coming late to the party, but for what it's worth I've implemented all of the above and so far AS2 (using mendelson) has been the easiest and least error prone.
My observations:
Of course at the end of the day, your vendor is going to dictate how far they're willing to go with providing connection methods and you'll need to choose the best method that they provide.
1 realtime processing is not actually realtime processing, but a management acceptable approximation of. Your managers may differ from mine.
File transfer presents a number of complications.
I would prefer a webservice, or just https access to the file with digest/basic auth, but for very large files, that may not be practical for them.
Another answer could be to use a shared bucket on amazon s3, where you have read access, and they have write access. I have used that a couple of times as a poor man's secure file transfer.
I have used flavors of FTP in this way, here are some tips if you do:
1) Use a secure version like sftp - ftp is just not secure for the credentials or data.
2) Use a semaphore file to indicate when the latest file is complete and available, or make sure that when they write the file to the FTP dir, they move it in whole, so you do not access incomplete files.
3) Make sure each file has a unique file name (timestamp, sequence number, etc.) so you can keep track of which you have processed and which you haven't. Do not reuse the file name, as you do not know when you have processed already, and could get a race condition of the file is updated as you are accessing it.
4) Use a hash value to check for successful transfer. They could provide an md5 hash for the file, and then you could check this against your version once you have completed copying it. I have often used the md5 file as a semaphore as well, to both indicate a file is available, and provide a means to check the transfer was complete and correct.