First a quick overview of the environment:
NetBackup running on Windows Servers (6.5.4 if you care) with LTO3 drives.
The backup target used to be a Solaris 9 server, on Sun hardware, with Veritas Volume Manger.
Rebuilt as RHEL5 box using LVM to manage the volumes, now on a Xiotech SAN. With a large number of volumes.
The nature of the data and the application that the box runs (Optix) is such that it used to write to a volume until it reached a certain size and then that volume was locked forever more. Hence we have \u01 \u02 \u03 ... \u50. A while back (still on the solaris build) we expanded and opened those volumes back up for writing so in any given day any or all of them might change. Backup throughput used to average 40MB/sec.
In the new Linux build we're averaging something closer to 8MB/sec. Given that here is 2.1TB of data here that's sort of wildly unacceptable, even running 4 streams it is taking 48+ hours to complete. I/O on the server is pegged. I am pretty sure it's not the SAN because other clients using the same class of storage and similar server hardware are backing up at a pokey but tolerable 20MB/sec.
I'm looking for ideas on improving throughput. The Solaris guys in the office next door are blaming LVM on Linux. Nobody thinks it's the backup environment, because that's still performing as expected everywhere else. The admin of the now very slow box says "I don't know it's not me, the users say it's fine." Which is probably true because it's a document management system and they're reading and writing small files.
Troubleshooting ideas? Has anybody seen LVM trash backup or other I/O performance? Especially given a largeish number of volumes holding a very large number (10 million maybe)of small files?
Edited to correct units.
Edited to add:
NIC is at 1000/Full (as checked from both the OS and Switch)
Filesystem is EXT3.
More new information....
The performance hit appears to be happening on several boxes running LVM and EXT3. Basically all the new RHEL5 boxes we built this summer.