3

We've got several processes that move files across servers - SFTP, FTP, SCP; Windows, Linux, AIX; there is a workflow component (usually require a control file with filenames and hash values to move a batch of related files). The action is often initiated on our servers to get the files, so we need to make sure they're done being written.

We have some homegrown scripts to do this, but they don't always work properly, and troubleshooting, maintenance, and log review is not easy this way. There's a lot of servers, and our scripts don't have central logging or a dashboard/console/etc.

We're looking into commercial products to do this. Has anyone used MQ File Transfer Edition? Another team in our company is using Aspera, does anyone have any thoughts on that, or other favored products?

I have no idea what our budget is for this, yet. Just trying to get a handle on the product space from the perspective of other admins.


/edit - In my situation, we're moving 2-file payloads (one binary, one metadata) of scanned images from different sources to different destinations. We wait until a 3rd control file is written with the checksums - when the move is complete, the control file is deleted.

The sources are primarily a handful of Windows file servers, or Windows SFTP servers, that receive these files from scanning processes. We also have sources that are FTP or SFTP servers that receive the same payloads from external parties. The destination is a set of AIX servers that ingest the images into an archive, so the files don't remain in the destination either. Robustness is definitely our major concern.

We move a few GB every day, I guess. (Without centralized logging, I can't give a better number.) The binary files probably average around 100 MB, the metadata quite a bit smaller.

T.Rob
  • 226
  • 2
  • 8
mfinni
  • 35,711
  • 3
  • 50
  • 86

3 Answers3

2

I have implemented WMQ FTE for several customers and it would definitely meet the requirements that you described. You can configure it to watch for the control file and then move the data files and delete the control file. It can also be driven by an MQ message that the thing creating the files sends. FTE agents can connect to WMQ as clients so you need only one WMQ server in a small deployment and the FTE agents can be on all the platforms you mentioned. The only exception is that a z/OS FTE agent must have a local queue manager (due to there's no WMQ client for the z/OS platform). Of course, it is set up for ad-hoc, user driven transfers as well.

FTE uses all non-persistent messages and a light control flow between the two agents (over WMQ, of course) that acks the data stream. Assuming that both sides are up, the entire transfer happens in memory with nothing being written to disk on the queue manager so it's screaming fast. If one side goes down, the transfer picks up where it left off as soon as service is restored. Both agents checksum the data and files so that if either the source or target file changes during the outage or during transmission, the transfer aborts with an appropriate error message.

Any sort of automation you might like to script can be done with Ant or any executable that you want to call, either on the sender or receiver side, either before or after the transfer. For example, I have one client who encrypts files outbound to their customers' SFTP servers and then decrypts the files on arrival. This is done by calling Ant to run GPG before outbound transfers and after inbound transfers.

T.Rob
  • 226
  • 2
  • 8
1

I have not used MQ File transfer edition so I cannot comment on this. I have done a lot of file transfers including EDI, FTP, AS2, FTPS, SFTP, rsync, SCP, aspera, svn, etc. Ultimately my answer would depend on your exact requirements. From what it sounds like the most important thing you are after is reliability of file transfers.

Firstly I would recommend on some sort of standardization of platforms, maintenance and management, which is what it sounds like you are looking at doing. Make every server regardless of OS/config use the same process to get files to and from nodes. Multiplying the troubleshooting across different configurations can make simple tasks very frustrating. When i think of reliability I do not think of windows, but a lot of the time there just simply isn't a way to avoid it.

While I do not know your exact requirements I will provide some possible solutions for you, if you can clarify on exactly your needs (WAN, LAN, file size, number of transfers daily, importance of transfers, etc) I can provide you with a more accurate answer. The transfers I have setup in the past are anywhere from small <1kb files to hundreds of GBs of data, from people don't get paid if the transfer doesn't happen to data that may never even be used, from open internet transfers to encrypted data, across encrypted transfers across encrypted VPNs.

What you are really after is a semi new term in the industry called Managed File Transfer. http://en.wikipedia.org/wiki/Managed_file_transfer

At the end of the day, get the Gartner Magic Quadrant Report for this, review it and choose a vendor that meets your needs. You'll notice Aspera in the list, but consider CFI for your needs. Considering you are specifically looking for a commercial product this is your best bet. Private message me or comment if you want any more input on my research in this sector.

Here is my personalized input.

Centralized FTP:

This is good because FTP is universal, it's used in so many places and has so much support across systems. A lot of popular FTP servers will provide support for a lot of authentication methods as well as protocols. If you are able to centralize the server for all the nodes then troubleshooting becomes a lot easier, when something goes wrong you check the server log, or ideally have logs auto report to you via email, and if there isn't anything wrong its pretty clear its a client or network issue. The problem is FTP isn't perfect, it can easily fail, and is particularly slow when dealing with large amounts of small files. Across OS you may find file naming issues and more. If you are going to consider this solution use clients and a server that can support simple file verification. http://en.wikipedia.org/wiki/Simple_file_verification. The mechanism used to check the files is as it says, simple, and could be checked across multiple platforms. There are a number of servers that support checking files as they are uploading and can report automatically if a file fails checking, along with checking full file sets rather than individual files, also providing some percentage for the full structure to be uploaded. gltfpd is a popular one, but keep in mind it is a bear for configuring, but once you have it setup you may never need to touch it again. http://www.glftpd.com/. Gene6 is pretty popular as well

Rsync the files

I've used rsync with scripts a fair amount and I found this to be very reliable and pretty robust when accounting for error checking. You'll find rsync popular among backup scripts because of this. I do not know of many off the shelf programs for rsync, so you are looking at coding up a solution for this and once again you will be without centralized logging and you may run in to a lot of the same issues, but honestly I found rsync reliable enough, and with the delta transmissions with large file sets and integrity checking is a pretty quick and dirty way to get things done.

Aspera

Aspera is great technology at it's core for high latency, high bandwidth transfers. If you are not transferring across a WAN, and not transferring large data sets I would not recommend it. I run a large Aspera deployment and it is littered with transfer problems and software bugs. If you are looking for very basic functionality it is a pretty good solution but when it comes down to more advanced processing be prepared to write your own scripts to transfer the data. The software seems to be more focused at a small niche business and they seem to struggle across enterprise deployments. The centralized logging they have with one of their products would solve the centralized logging needs, and their pre and post processing would work for your needs as well but just keep in mind you may end up spending a fair amount of money for a half working solution. I mentioned CFI above, their product is much more enterprise but they struggle to deliver on a single experience. Depending on your needs don't take my word for it, get trials of their products for yourself.

Version Control System

I'll first say that this doesn't seem like it would fit the requirements but is another option. If the files you are transferring are not transactional, consider storing these files in a version control system. In this scenario when a file is needed to be transferred it is checked in to the version repository, and when needed it is synced at the remote end. In an instance where you need version control and files possibly to interact with each other, as well as a centralized server this may be a good option.

As a final side note check out what twitter uses to pass config files across their many, many nodes: http://engineering.twitter.com/2010/07/murder-fast-datacenter-code-deploys.html

Once again I cannot stress enough that the correct answer is based on your exact requirements.

Hope this helps you.

pablo
  • 3,020
  • 1
  • 18
  • 23
  • Excellent write-up. I edited my question. – mfinni Feb 20 '11 at 00:12
  • Very good writing. Just saying that usually storing files to Version Control such as GIT is considered really bad practice. The size of repository grows really fast as it keeps every version (new and old ones) of the file. Some services such as GitHub provides BLOB api as well to handle binary objects. https://developer.github.com/v3/git/blobs/ – Anssi Jan 04 '18 at 09:41
0

When I was working for a large insurance company, we used Connect:Direct, to automate and manage file transfers (most via SSL/TLS) between various windows/linux/AIX/mainframe servers.

John Galt
  • 67
  • 5