I am presently learning CUDA and I keep coming across phrases like

"GPUs have dedicated memory which has 5–10X the bandwidth of CPU memory"

See here for reference on the second slide

Now what does bandwidth really mean here? Specifically, What does one mean by

bandwidth of the CPU
bandwidth of the GPU
bandwidth of the PCI-E slot the GPU's are fitted onto the motherboard.

My background in computer architecture is very poor, so if someone can give a very simple explanation of these terms, it will be really helpful.

My very very limited understanding of bandwidth is the highest possible number of gigabytes that can be trasnferred per second from the CPU to the GPU. But that does not explain why we need to define three types of bandwidth.

smilingbuddha

Posted 2012-02-08T16:51:10.550

Reputation: 1 591

Not to sound snarky, but learning to deal with CUDA is something that a good background in computer architecture is helpful for (really, any sort of programming whatsoever, but this even moreso because you have to worry about the underlying devices more than you would with a lot of other stuff). I'd encourage you to take a detailed course; it's good knowledge to have, although I suppose it's not especially a requirement. In general, as a programmer, having that sort of knowledge gives you a leg up on the other guys. – Shinrai – 2012-02-08T18:16:42.073

Answers

Analogy:

A common analogy for bandwidth is the highway. The more lanes there are, and the faster the cars go, the more cars can travel down the highway at a time.

Using trucks instead of cars, imagine that you need to transport a bunch of merchandise or mail from one city to another. If you have a single-lane highway, then only one line of trucks can travel down it, thus reducing how much merch can be transfered, thus taking longer to get it all across. Conversely, if you have a 10-lane road, but each truck travels at a very slow speed, it still takes a lot of time to get everything delivered.

Now imagine that instead of delivering to the next town, you need to deliver father. To get the goods from this city to another country, it has to travel through several cities, and between each city, there are different highways; some narrow, some wide, some uphill, some downhill, etc. If the road from the city 1 to city 2 is ideal, as is the road from city 3 to city 4 (the destination), but the road between cities 2 and 3 is terrible, then it becomes a bottleneck. This causes the road between 3 and 4 to be underutilized and slows the total delivery time.

Application:

Back in computerland, getting data from one place to another is the same situation. You have devices like a CPU, a GPU, and RAM (the cities), and cables and busses (the roads, not to be confused with buses, though that still kind of works). A device like the CPU can process data in and out at a certain rate which could be called a bandwidth, though nobody really calls it that. Rather, bandwidth usually refers to the pathways that data takes. When you have one device spitting out data that needs to go somewhere else. The bandwidth is the amount (# of lanes) and speed (speed of trucks) through which the data can be passed.

Question-specific Explanation:

In the case you are referring to, what they mean by GPUs have dedicated memory which has 5–10X the bandwidth of CPU memory is not the bandwidth of the CPU or GPU themselves, but rather the bandwidth through which data passes back and forth between those devices and their associated memories. Specifically, the bus through which data passes between the CPU and the main system RAM has a lower bandwidth than the bus through which data passes between the GPU and the RAM on the video card. This is due to two factors: the width and the speed.

GPU side:

The width of the bus between the GPU and the video-RAM is often 128-bit these days because the video-RAM is integrated into the same adapter as the GPU. That both components are assembled by the same company means that they can tightly integrate the GPU and the video-RAM in a way that provides extremely high transfer between them. Also, the video-RAM tends to be specialized GDDR3 memory these days, which means that it can access (read/write) data extremely fast. Finally, the GPU is a specialized CPU that due to the nature of graphics programming can do all kinds of crazy arithmetical operations at blazingly fast speeds.

CPU side:

On the other side, the bus between the CPU and the system RAM is usually only 64-bit. There are numerous reasons for that, but compatibility tends to be the primary limiting factor. Also, remember that systems are usually built with components from various sources. One manufacturer makes the CPU and another one (or two or three) makes the RAM, and yet another makes the motherboard on which the bus lies. There is no way to know in advance what components will be present in the system (which CPU? what kind(s) of RAM? which of countless models of mobo?) so they have to comply to standards, which frequently reduce to the lowest common denominator. If one of the RAM modules is slower than the others, they all have to reduce their speed to accommodate the slow one. If the motherboard can only handle 400MHz RAM, then the 800MHz memory has to run at half speed, and so on. All these factors limit the total bandwidth between the CPU and the system RAM. The RAM itself, even if DDR3 will still likely be slower than the specialized video-RAM. Finally, the CPU is a general-purpose processors as compared to a GPU, and so it will tend to be slower overall.

Summary:

In summary, the slide is alluding to the fact that the video card’s processor and memory as a unit far out-performs the system’s processor and memory as a unit.

Synetech

Posted 2012-02-08T16:51:10.550

Reputation: 63 242

Think of bandwidth as the diameter of a water pipe. The wider the water pipe, the more water can flow through a section per time unit. Thus, the more bandwidth a device has, the more data it can transfer per unit of time.

m0skit0

Posted 2012-02-08T16:51:10.550

Reputation: 1 317

-1

Bandwidth just means how much data your device can transfer per time.

You don't have to define three different bandwidths, that just means that every connection in your system has a bandwidth and every of these lines can be a bottleneck to something.

For example:

242 MB/s means that it can transfer 242 MB per second. Now, if you have an applicaiton which needs to a lot of memory operations on loads of RAM, you are being bottlenecked by your memory bandwidth, even though your CPU could probably calculate the work faster.

inf

Posted 2012-02-08T16:51:10.550

Reputation: 2 735

That would mean the effective bandwidth of my system is the minimum of the bandwidth of the GPU , CPU and the PCI-E slot? – smilingbuddha – 2012-02-08T16:56:29.677

@smilingbuddha I just can assume what you mean, but your system doesn't have one bandwidth. That is very application specific, if your app needs a lot of RAM or GPU memory, then your RAM or your GPU memory is your bottleneck. If your app just loads something to your GPU RAM and then calculates some heafy stuff, it is probably the caches/calculation units of your GPU, which gonna limit your speed, but not the PCI-E lanes. – inf – 2012-02-08T16:58:40.283

What does bandwidth of a device mean?

Answers

Analogy:

Application:

Question-specific Explanation:

GPU side:

CPU side:

Summary: