5

The scenario is as follows:

Copying and then syncing from a live mail server via network(only) to another server.

The mail server is live meaning that lots of files (mails) are being altered, deleted and created. I have tried rsync but it's extremely slow and after some time I get:

warning: some files vanished before they could be transferred (code 24) at main.c(1040) [sender=3.0.5]

Since the server is live I would prefer not to significantly increase the load on the server.

Which is the best option, preferably with cons and pros of each approach.

Important facts:

  • 15 million email files (mostly small sized)
  • 1,45 TB of data

Update

Purpose: Migrating to a new server

ETA: ASAP

Update 2

Server limitation: Live mail server runs in old software and hardware I wouldn't risk installing anything there.

Update 3

I would prefer open-source solutions.

pl1nk
  • 451
  • 5
  • 22
  • You forgot to specify the time requirements and usage. Are you doing this for backup purposes or to create another instance of the email server? – Khaled Aug 15 '12 at 12:30

4 Answers4

3

One approach would be to use Perdition for POP/IMAP connection handling and then just setup your Postfix to route SMTP to an old or new server, depending on where the user mailbox is located. This way you can migrate your server live one mailbox at a time without any downtime.

Of course you can setup a scheduled maintenance break, and then just rsync the files. Copying 15 million files WILL take a while, though. Depending on your server and mostly, the I/O system, it might help to run several rsync processes in parallel; one copying files/dirs starting with [a-e], second one with [f-j], third one with [k-p] and so on.

But having done a similar thing twice, I would recommend the Perdition approach. After initial setup it truly takes the migration pain away.

EDIT: You asked for more info about Perdition setup, you got it.

You need to have some central place where you have your user account information stored. That can be MySQL, PostgreSQL, OpenLDAP or something else. I have always used OpenLDAP with great success. Anyway, you need to have a database table / LDAP schema which contains the user name and the server name where user mailbox is located. There are Perdition migration utils available which will help you in the initial setup.

Then Perdition receives the POP/IMAP connections, looks up user location from LDAP or whatever, and transparently proxies the traffic between the user mail client and the actual server. Postfix can also lookup this actual server location from LDAP/SQL and send the mail there.

Here is a PDF about Perdition + LDAP setup and here is the Postfix LDAP manual.

Next just create a migration script which copies the mailboxes one by one over IMAP with imapsync or similar util, and after each successful mailbox migration it just should update OpenLDAP or whatever central location about the user mailbox location.

EDIT #2: The imapsync I'm talking about is free software, and available in most Linux distributions from their package repositories. You asked me to elaborate more about rsync approach; it does not matter if you choose imapsync or rsync, the basic principles are the same. You just create a script with bash, Perl, or some other language you feel comfortable with. Here's some pseudo code.

@accounts = fetch_all_the_account_names_from_ldap();
for (@accounts) {
    rsync -avP /var/spool/mail/$user $newserver:/var/spool/mail/
    update_user_location_in_ldap($user, $newserver);
}
Janne Pikkarainen
  • 31,454
  • 4
  • 56
  • 78
  • Could you provide more info in using Perdition (install/config/deployment)? – pl1nk Aug 15 '12 at 12:47
  • pl1nk: Did my extended reply help you at all? – Janne Pikkarainen Aug 15 '12 at 12:55
  • +1 for your update but the deployment of perdition is a bit of a hassle. – pl1nk Aug 15 '12 at 13:00
  • Even thought Perdition seems to help with the downtime. The main issue (major scope of this question) still exists `How to copy the data?` – pl1nk Aug 15 '12 at 13:22
  • You copy the data with `imapsync`. Or, if you want, of course you can use `rsync` one user directory/mailbox file at a time, and after each user just update the LDAP/SQL user information to point the user to the new server. In my case (50 000 - 100 000 user accounts) the fastest accounts had almost no e-mail and were copied around in seconds, the more active accounts with much more e-mail took maximum of some minutes. – Janne Pikkarainen Aug 15 '12 at 14:24
  • It seems that I need to buy imapsync. Could you elaborate on the rsync approach? – pl1nk Aug 15 '12 at 15:23
  • p|1nk: Any better now? – Janne Pikkarainen Aug 15 '12 at 18:33
  • The imapsync in your link is neither free (first line on website: `Buy imapsync source code`) nor is included in most Linux distributions, (ex. Ubuntu). I don't get your pseudo-code idea and how it could improve the overall rsync performance copying performance (repeating mysel, the major scope of this question). – pl1nk Aug 16 '12 at 10:35
  • My point is that with one-user-at-a-time approach the overall rsync performance does not matter at all! Even if the migration would take one year, it still would not matter because the whole operation can be done on-line without an actual maintenance break. – Janne Pikkarainen Aug 16 '12 at 10:42
  • I understand your point, but still you don't focus on the main issue of this migration.. as I mentioned rsync is too slow and breaks, other than that I would prefer the (initial copying) to be faster than that.. maybe going along a tar via ssh – pl1nk Aug 16 '12 at 11:25
  • I do focus, but suggesting a completely different approach for you. If your rsync for _per-user_ tranfers would be 1) slow and/or 2) unreliable, then you have some other problems with your setup. This is my suggestion, take it or leave it. :-) – Janne Pikkarainen Aug 16 '12 at 11:29
  • Could you please send me a link of the open-source version of imapsync? – pl1nk Aug 21 '12 at 11:23
  • OK, they have seem to changed their license a little bit. Anyway, two secs of googling revealed this: https://fedorahosted.org/imapsync/ ... and the package is available at least in Fedora, Debian and RHEL/CentOS EPEL repositories, so it's just `apt-get install imapsync` or `yum install imapsync` command away in your distribution of choice. Also seems to be available in ArchLinux, Gentoo and FreeBSD. What distro are you using if you truly cannot install imapsync out-of-the-box with your package manager? – Janne Pikkarainen Aug 21 '12 at 11:39
  • Finally you admit it! I'm using Gentoo, I'll search for a source version. – pl1nk Aug 21 '12 at 11:46
  • What's wrong with `emerge imapsync` (or possibly, `emerge imap-sync`)? :) – Janne Pikkarainen Aug 21 '12 at 11:49
  • Very old system I'm not risking on installing anything there. It seems that imapsync needs a bunch of dependencies. – pl1nk Aug 21 '12 at 12:00
  • So then install `imapsync` to your new server and let it handle the migration. – Janne Pikkarainen Aug 21 '12 at 12:02
  • Well, I just checked the requirement to config imapsync and no sorry to complicated... – pl1nk Aug 21 '12 at 12:12
1

You could look at hosting the server on a distributed file system. You could,use DRBD to perform a filesystem replication. Your current server can primary server with a secondary (which you already have as a new server). If the primary fails, the secondary will become the primary. You can implement DRBD on the current server and the initial sync will happen transparently (no downtime) in the backgroud to the secondary (new server) without you noticing. There is no files to copy by you manually. -- http://www.drbd.org/

Chida
  • 2,471
  • 1
  • 16
  • 29
  • Thanks for your idea, however the server is quite old and I cannot take the risk of installing things so implementing DRBD is not an option. – pl1nk Aug 15 '12 at 14:07
0
  1. Route email to the new server by changing the MX record(s) for the domain(s) in question to point to the new email server.

  2. Move all user mailbox content and direct all email clients to the new server.

  3. Transfer any remaining email on the old server to the new server by whatever means you wish.

joeqwerty
  • 108,377
  • 6
  • 80
  • 171
  • You are not describing, the most important step on how to move all user mailbox content to the new server. – pl1nk Aug 15 '12 at 15:26
  • Is downtime acceptable? Is yes, how much? – Chida Aug 15 '12 at 15:48
  • Downtime is not acceptable, after the first big copying/sync of files all mail requests will be routed to the new server. At this point point some users will "not/see" some of their emails until the last sync is completed. – pl1nk Aug 15 '12 at 16:40
0

This recipe worked nicely for me:

1. Copying the first bunch of files example:

tar c dir/* |gzip - | ssh user@host 'cd /dir/ && tar xz'

In gzip you can have different compression levels, where -1 indicates the fastest compression method (less compression) and -9 or --best indicates the slowest compression method (best compression). The default compression level is -6 (that is, biased towards high compression at expense of speed). - gzip man page.

2. Using rsync daemon

After the data have been copied, rsync job is easier and by using rsync daemon (assuming you are in a controlled environment, since the data are not encrypted) the overall performance is by far better.

Since I had to deal with lots of small files I disabled rsync compression, the processes were ~40% faster without compression.

3. Making a cronjob every x hours to have an always update version on the remote server.

0 */03 * * * flock -n /Any_Dir/rsync.lock -c "nice -n 19 rsync --password-file=/rsync.passwd --delete-during -ra /source_dir/ user@rsync_server::ModuleName > /var/log rsync_cgp.log" 2>&1

In my example I start a rsync process every 3 hours, using flock to create a lock file and take care that no 2nd rsync cronjob will start if the 1st one is not completed. Additionally since I don't want to hammer the server I modified the scheduling priority of rsync to 19 -least favorable. Finally I redirect rsync output overwriting the log file (to keep it small in size). Caution: making use of -v in rsync could lead to a huge log file.

Every rsync process duration takes ~ 16-30 minutes, depending on the load of the server.

pl1nk
  • 451
  • 5
  • 22