3

We are an animation studio and we have an intranet that constantly needs to move large files (2 GB+) around the network. We have the following config:

Three Netgear 24 port unmanaged switches. Switch 1 connected to Switch 2 connected to Switch 3 and Switch 3 connected to router. There is a NAS connected to Switch 2.

50 PC's connected to the various switches. 20 workstations and 30 render nodes.

Things work smooth till we send a render job. When we do this, around 30 machines attempt to copy the same file from the NAS at once and the network slows to a crawl.

Managed switches are out of the budget. Upgrading to two 48 port unmanaged stacked switches might be feasible.

I have the following questions:

  1. Is the way I have connected the switches incorrect? Should Switch 1 and Switch 2 both connect to Switch 3?
  2. Will moving to two 48 port stacked unmanaged switches relieve this congestion?
  3. Do 100 Mbps devices on the switches affect the GB connectivity of other devices on the GB switch?
  4. Is there a way to break up the network. Say have 20 PC's, one NAS on one switch and 20 PC's, a second NAS on the other switch just in case the NAS is the bottleneck. In this case, should I connect both switches to the router? Will they still be able to see each other? The router is a standard router supplied by ISP.
  5. Is the bandwidth simply not enough for this kind of work?

Thanks for the help. This network stuff is really confusing...

Jeff Atwood
  • 12,994
  • 20
  • 74
  • 92
Andy
  • 31
  • 1

7 Answers7

6

I am sorry to say, but SOME of the stuff you say makes no sense at all. Really. You pretty much ahve to be a north korean rendering company for it to make sense.

50 PC's connected to the various switches. 20 workstations and 30 render nodes.

And:

Managed switches are out of the budget

Really. You can pay 20 people working, plus all the licenses, and lack the money for a managed switch, which is in a range of around.... 100 USD per each of your workstations. Gratulations. As I said - makes sense when you are a north corean company where your employees earn like 1,5 USD per hour. Managed switches are not that expensive. Especially if you need them.

Ok, here we go.

  • You are much better running it all from one switch
  • Go for a 48 port switch with the capability to show 1-2 10gbit uplinks.
  • Attach your storage infrastructure to the 10gbit ports.

No money? Bad news - no luck.

Also make sure you can actually feed 1gbyte of data from the storage, which is another budget thing. Depends on hardware and disc layout what performance you actually can get from the discs.

Anyhow, your problem is twofold. First, 1gbit is roughly 100mbyte per second. This sounds nice (20 second for a 2gb file), but if 30 nodes pull that this is multiplied by 30 because you ALSO only have 1gbit to the storage. Bottleneck, here we come. Second, everything from Switch 1 to Switch 2 also pases through the 1gbit bottleneck.

When you move this amount of stuff around, you simply NEED a lot more bandwidth on the storage than on the workstations. 10gbit here allow you to basically have only a factor 3 reduction when the nodes pull, instead of a factor 30.

This is pretty much an item of reequirements not meeting reality - with a 1gbit network copying 2gbx30 = 60gb of data around IS going to take a long time, and you basically tunnel it all through a 1gbit connection, so the swithcing does not give you additional bandwidth. So, get a budget, upgrade to equipment that is good enough for what you want to do, and then problems will disappear. This IS among the larger problems, though - someone should have put in a budget when planning 30 render nodes.

TomTom
  • 50,857
  • 7
  • 52
  • 134
  • 1
    +1 Stop being cheap. If this was my network, I'd probably be speccing some 48 port GigE switches, 10Gbit uplinks. Or screw the budget and have a nice low-latency Force10 core. – Tom O'Connor Dec 11 '10 at 23:29
  • Yep, the problem was when this was designed, there werent experienced network people around, so nobody anticipated that the network would become a choke point. Will need to management to take a hard look at this now. Any recommendations for those 48 port switches with 10GB uplinks? – Andy Dec 12 '10 at 04:23
  • Depends. You dont need managed there. I have some good experiences iwth Netgear. Otherwise, especially if you are possibl getting into routing there, get an Extreme NEtworks Summit 450 levle, refurbished with warranty. Not cheapo (3000-4000 USD) but that is a router that can handle all ports at full speed thanks to the routing being done in hardware. In between these two extremes, all is ok. – TomTom Dec 12 '10 at 14:14
3

My read of it is that the NAS is the main bottle-neck. 30 nodes pulling the same (presumably large) file is a great way to fully saturate the NAS's network link for anything that hits it. I imagine that as the Render nodes finish they upload their completed work to the NAS which probably causes its own slowdowns as well. An additional bottleneck is probably in your switch config.

Switch 1 -> Switch 2 (NAS) -> Switch 3 -> [Router] -> Internet

Unless your router is more than a normal SOHO router you should be able to do this for no cost:

Switch 1 ->  [          ]
switch 2 ->  [  router  ] -> Internet
Switch 3 ->  [          ]

That by itself may help your problem, and the best part is it doesn't cost anything but a bit of downtime.

You really do want to have your render nodes and NAS on the same switch if possible. Since you have 30 render nodes, you can't put all of them and the NAS on the same switch with your current config, so new hardware will be needed. You can help alleviate stress on your uplinks by going with stackable switches like you said. That should provide better bandwidth between the switches. Both steps should leave your uplink port(s) less saturated, though that doesn't affect load on the NAS.

30 nodes pulling files at high rate can fully saturate 1GbE, so fixing that requires one of several steps:

  • Bonding multiple NICs on the NAS, but that requires managed switches
  • Using a 2nd network port on the NAS as a dedicated share-point for the render-nodes. The network will still bog down for them, but normal usage else-net should not be terribly affected (disk I/O withstanding)
  • Getting a second NAS devices for general office usage.
sysadmin1138
  • 131,083
  • 18
  • 173
  • 296
3

Perhaps you can switch to something like flamethrower, the multicast file distribution program, to prevent you from having to send 60GB (2GB to 30 machines) and instead sends 2GB across the network...

If that's not an option, you're probably going to have to upgrade the bottlenecks in your network. From your description, it sounds like the NAS is the primary bottleneck, so your first need will be to see if you can bond multiple interfaces. That depends on the NAS. A 48 port switch is going to have very high backplane bandwidth, but it doesn't matter if that's 40Gbps if your NAS is the choke-point at 1Gbps.

If you have 2 switches connected to another switch via 1Gbps links, that's going to be a choke point. So going up to a 48 port switch and getting as much stuff on it will help remove those choke points.

Beyond that, perhaps you can use a distributed, parallel file system like PVFS? Think of it kind of like a bittorrent file-system, where many machines share the load of distributing the file, instead of the single central NAS. In that way the NAS does not become the bottleneck.

Sean Reifschneider
  • 10,370
  • 3
  • 24
  • 28
2

It would definitely make sense to tackle this from a software perspective as well as a hardware perspective. I'm sure you can find software out there that will multicast the file out to the rendering machines.

Jason Berg
  • 18,954
  • 6
  • 38
  • 55
2

Great answers from the others here. You should have enough there to give you all the ideas you need. However, here's a couple of thoughts from outside the box:

  1. Set up a local bit-torrent tracker on your network (visible ONLY to your network), and share your large files among your workstations via bittorrent.

  2. Does every render station need the entirety of every file? If not, then share out portions of files. Either split your large files into smaller chunks, or serve your files with an HTTP server that understands range requests, and have your clients download only the byte ranges they need.

Steven Monday
  • 13,019
  • 4
  • 35
  • 45
  • 3
    Heh. How about using USB sticks to transfer large files around.. You can also throw them across the office. Improves transfer times, and improves workers' hand-eye coordination. – Tom O'Connor Dec 12 '10 at 01:10
  • That's the spirit! – Steven Monday Dec 12 '10 at 01:42
  • +1 For p2p. If you he wants to use the full internal bandwidth of the switches he needs p2p. Each switch has 24*100*2Mbps internal bandwidth. And, of course, rearrange the switches so that one of them is the backbone with the storage connected to it. – Mircea Vutcovici Dec 14 '10 at 01:14
2

A peer to peer solution like aerofs may help you. Everyone is a file server! :-)

Mark Henderson
  • 68,316
  • 31
  • 175
  • 255
The Unix Janitor
  • 2,388
  • 14
  • 13
0

Here is what I would do.

1 - Take a baseline measurement of performance.

2 - Fix your current config (see below) by adding a new switch.

PC’s-->[ SW1 ]-->[          ]   
                 [          ]   
                 [          ]   
PC’s-->[ SW2 ]-->[  SW New  ]-------------->[ Router ]
                 [          ]   
                 [          ]   
PC’s-->[ SW3 ]-->[          ]--->NAS

3 - Take a new measurement.

4 - Install software that pushes (multicasts) the file from the NAS to the PC's. (see Sean's post)

5 - Take a new measurement.

At each measurement point run the same test multiple times.

Also, if you can tell us the Model / Mfg. of the NAS we might be able to provide other answers.

dbasnett
  • 683
  • 5
  • 11