Why are we still using CPUs instead of GPUs?

380

160

It seems to me that these days lots of calculations are done on the GPU. Obviously graphics are done there, but using CUDA and the like, AI, hashing algorithms (think bitcoins) and others are also done on the GPU. Why can't we just get rid of the CPU and use the GPU on its own? What makes the GPU so much faster than the CPU?

ell

Posted 2011-07-10T13:31:41.137

Reputation: 3 836

@vartec as a seasoned CUDA developer, I think that may be the most accurate analogy I have ever seen, hands down. I'm saving that one :) – Breakthrough – 2014-07-12T07:11:57.933

5@vartec: I think a slightly better analogy might be between buses and taxicabs. If there are forty people who all want to go from the same place to the same place, a bus will be much more efficient. If there are forty people whose desired origins and destinations are widely scattered, even a single taxicab may be just as good as a bus, and for the cost of the bus one could have multiple taxicabs. – supercat – 2015-01-19T21:39:01.893

As with all important technical questions, Mybusters have addressed this (And it's not a bad analogy)

– Basic – 2015-06-06T14:57:43.933

– Ƭᴇcʜιᴇ007 – 2016-01-03T18:14:42.040

3how do I know which answers contain correct information? Should I wait till others up/down vote answers? I think I was too hasty in accepting an answer :O – ell – 2011-07-10T17:29:04.707

14There are some recent answers @ell now, which do not contain "misinformation". They are gradually rising to the top with up votes due to efficient market mechanism of the wonderfully designed StackExchange ;-) I'd suggest waiting a little longer before accepting an answer. Looks like you very prudently are doing just that. This is a good question, by the way. Might seem obvious, but it isn't at all. Thank you for asking it! – Ellie Kesselman – 2011-07-10T20:43:01.033

There's no reason why one couldn't, eg, create a Java JITC for a GPU, from a code-generation point of view. And most OS code is now written in C/C++ which can be easily retargeted. So one is not tied to the x86 heritage in any really significant way (unless you're running Windoze). The problem is that few (if any) GPUs are at all good at general-purpose processing. – Daniel R Hicks – 2011-07-10T20:48:24.130

@DanH Except that Java is a bad language, specifically for creating programs which have a high level of parallelism. We need mainstream languages, like for functional programming, where parallelism is the natural way of expressing any program -- futher more the programming languages have to be well suited to operate on very small amount of memory for each unit of computation, as that is when the GPU operates efficient. As mentioned in the in the question, there are only few problems such as AI and the like which does this naturally without a new programming language – Soren – 2011-07-10T21:40:23.960

But you don't need to run Java. The point is that you're not chained to a processor architecture. As to a new language for parallel processing, people have been trying to invent one for maybe 30 years now, and not made significant progress. Whereas after 30 years of developing sequential programming languages we had Fortran, COBOL, Modula-2, C, Pascal, Ada, PL/I, C++, and a host of others. – Daniel R Hicks – 2011-07-11T04:03:23.270

Related question from Stack Overflow: Why aren't we programming on the GPU?

– Kobi – 2011-07-11T04:21:20.197

128Kind of like asking "If Boeing 747 is faster and more fuel efficient, why do we still drive cars"? – vartec – 2011-07-11T09:35:10.770

Does this sound familiar (RISC vs. CISC) ? – Aki – 2011-07-12T06:14:16.013

8No, because it's not RISC versus CISC. It's one of the other computer science fundamentals, slightly disguised. It's "Why do we offload work from the central processor onto I/O processors?". – JdeBP – 2011-07-12T12:26:19.997

Answers

394

TL;DR answer: GPUs have far more processor cores than CPUs, but because each GPU core runs significantly slower than a CPU core and do not have the features needed for modern operating systems, they are not appropriate for performing most of the processing in everyday computing. They are most suited to compute-intensive operations such as video processing and physics simulations.

GPGPU is still a relatively new concept. GPUs were initially used for rendering graphics only; as technology advanced, the large number of cores in GPUs relative to CPUs was exploited by developing computational capabilities for GPUs so that they can process many parallel streams of data simultaneously, no matter what that data may be. While GPUs can have hundreds or even thousands of stream processors, they each run slower than a CPU core and have fewer features (even if they are Turing complete and can be programmed to run any program a CPU can run). Features missing from GPUs include interrupts and virtual memory, which are required to implement a modern operating system.

In other words, CPUs and GPUs have significantly different architectures that make them better suited to different tasks. A GPU can handle large amounts of data in many streams, performing relatively simple operations on them, but is ill-suited to heavy or complex processing on a single or few streams of data. A CPU is much faster on a per-core basis (in terms of instructions per second) and can perform complex operations on a single or few streams of data more easily, but cannot efficiently handle many streams simultaneously.

As a result, GPUs are not suited to handle tasks that do not significantly benefit from or cannot be parallelized, including many common consumer applications such as word processors. Furthermore, GPUs use a fundamentally different architecture; one would have to program an application specifically for a GPU for it to work, and significantly different techniques are required to program GPUs. These different techniques include new programming languages, modifications to existing languages, and new programming paradigms that are better suited to expressing a computation as a parallel operation to be performed by many stream processors. For more information on the techniques needed to program GPUs, see the Wikipedia articles on stream processing and parallel computing.

Modern GPUs are capable of performing vector operations and floating-point arithmetic, with the latest cards capable of manipulating double-precision floating-point numbers. Frameworks such as CUDA and OpenCL enable programs to be written for GPUs, and the nature of GPUs make them most suited to highly parallelizable operations, such as in scientific computing, where a series of specialized GPU compute cards can be a viable replacement for a small compute cluster as in NVIDIA Tesla Personal Supercomputers. Consumers with modern GPUs who are experienced with Folding@home can use them to contribute with GPU clients, which can perform protein folding simulations at very high speeds and contribute more work to the project (be sure to read the FAQs first, especially those related to GPUs). GPUs can also enable better physics simulation in video games using PhysX, accelerate video encoding and decoding, and perform other compute-intensive tasks. It is these types of tasks that GPUs are most suited to performing.

AMD is pioneering a processor design called the Accelerated Processing Unit (APU) which combines conventional x86 CPU cores with GPUs. This approach enables graphical performance vastly superior to motherboard-integrated graphics solutions (though no match for more expensive discrete GPUs), and allows for a compact, low-cost system with good multimedia performance without the need for a separate GPU. The latest Intel processors also offer on-chip integrated graphics, although competitive integrated GPU performance is currently limited to the few chips with Intel Iris Pro Graphics. As technology continues to advance, we will see an increasing degree of convergence of these once-separate parts. AMD envisions a future where the CPU and GPU are one, capable of seamlessly working together on the same task.

Nonetheless, many tasks performed by PC operating systems and applications are still better suited to CPUs, and much work is needed to accelerate a program using a GPU. Since so much existing software use the x86 architecture, and because GPUs require different programming techniques and are missing several important features needed for operating systems, a general transition from CPU to GPU for everyday computing is very difficult.

bwDraco

Posted 2011-07-10T13:31:41.137

Reputation: 41 701

Instruction set too! One is SISD and the other is SIMD which has both their pros and cons. – MathCubes – 2019-07-15T21:15:02.397

41Like this answer, I think the main reason is that we don't have good main stream programming languages to deal with parallel architectures like this. We have struggled for decades to advance multi threaded programming, and people are still calling , multi threading "evil". Despite that multi-core CPUs and GPUs are a reality, and we will have to come up with new programming paradigms to deal with this. – Soren – 2011-07-10T18:47:28.977

Worth noting that Intel has been working on Larrabee architecture (for way too long) which is essentially a chip with a massive number of x86 cores on it.

– Chris S – 2011-07-10T21:07:57.000

Great answer for discussing the hardware reasons and discussing APUs and how they will change this. However, @Soren gives a very good point on the software side.

In reality, it's the combination of the hardware issues, the software issues, and the fact that CPUs work and when something is known to work, it's hard to get people to replace it. – Nich Del – 2011-07-10T21:40:54.823

All very good points, I would like to add that most all of this is focused on computer-based solution. I would like to point out that cellphone processor manufactures more or less are building a merged product where the graphics and CPU among many other things are all contained on one chip. My EVO 3D has a dual-core and quite impressive graphics support. You can bet that as soon as rooting is available I will have a desktop-grade OS (like Ubuntu) dual boot installed. So I'm arguing that instead of one or the other, it is more just blurring the line of difference. – CenterOrbit – 2011-07-11T00:24:07.243

3"we don't have good main stream programming languages to deal with parallel architectures like this." - Haskell, OCaml, Scheme, F#, Erlang, and pretty much any other functional programming langauge deal with multithreading very well. All the ones I mentioned are mainstream. – BlueRaja - Danny Pflughoeft – 2011-07-11T06:56:24.993

I would incorporate here the answer by Billy ONeal, it adds a very relevant (for today, as the convergence occurs it will dissappear) aspect. – Vinko Vrsalovic – 2011-07-11T08:32:49.690

I've expanded my answer based on your comments. Thanks for your feedback! – bwDraco – 2011-07-11T14:46:06.240

I see no-one has yet mentioned the position of the two processors relative to video RAM as a contributory factor in which is the "faster". – JdeBP – 2011-07-11T15:14:27.340

2@BlueRaja -- we are aware of these languages, your definition of main stream must be different than mine :-) – Soren – 2011-07-18T05:03:32.637

@dlikhten: What you suggested is highly technical as it involves the microarchitecture of the chips involved. I do not want to bog the reader down in these details. – bwDraco – 2012-03-06T03:19:04.873

I'm down-voting you because you forgot to mention the transfer time and cost difference. You can make a Tesla run as fast a single i3 thread, but its not wroth the money. Copy times mean that that jobs needs to be fairly large, many jobs of that size simply don't exist in regular computing jobs. – Mikhail – 2012-11-05T22:28:20.757

257

What makes the GPU so much faster than the CPU?

The GPU is not faster than the CPU. CPU and GPU are designed with two different goals, with different trade-offs, so they have different performance characteristic. Certain tasks are faster in a CPU while other tasks are faster computed in a GPU. The CPU excels at doing complex manipulations to a small set of data, the GPU excels at doing simple manipulations to a large set of data.

The GPU is a special-purpose CPU, designed so that a single instruction works over a large block of data (SIMD/Single Instruction Multiple Data), all of them applying the same operation. Working in blocks of data is certainly more efficient than working with a single cell at a time because there is a much reduced overhead in decoding the instructions, however working in large blocks means there are more parallel working units, so it uses much much more transistors to implement a single GPU instruction (causing physical size constraint, using more energy, and producing more heat).

The CPU is designed to execute a single instruction on a single datum as quickly as possible. Since it only need to work with a single datum, the number of transistors that is required to implement a single instruction is much less so a CPU can afford to have a larger instruction set, a more complex ALU, a better branch prediction, better virtualized architecture, and a more sophisticated caching/pipeline schemes. Its instruction cycles is also faster.

The reason why we are still using CPU is not because x86 is the king of CPU architecture and Windows is written for x86, the reason why we are still using CPU is because the kind of tasks that an OS needs to do, i.e. making decisions, is run more efficiently on a CPU architecture. An OS needs to look at 100s of different types of data and make various decisions which all depends on each other; this kind of job does not easily parallelizes, at least not into an SIMD architecture.

In the future, what we will see is a convergence between the CPU and GPU architecture as CPU acquires the capability to work over blocks of data, e.g. SSE. Also, as manufacturing technology improves and chips gets smaller, the GPU can afford to implement more complex instructions.

Lie Ryan

Posted 2011-07-10T13:31:41.137

Reputation: 4 101

I'm surprised no-one in this thread has mentioned the overhead of sending data to the GPU - limited bandwidth over PCI-Express buses makes some parallel operations on a GPU vastly slower than were they performed on the CPU. One simple case can be seen where varying the size of an FFT made a significant difference in performance on GPU vs. CPU due to the overhead of sending data, setting up a context, reading back results: http://stackoverflow.com/a/8687732/303612 Smaller operations can be performed in-cache on CPUs, and the memory bandwidth is vastly superior to the current PCI-E architecture

– Dr. Andrew Burnett-Thompson – 2015-02-27T12:11:14.500

1@Dr.AndrewBurnett-Thompson: that's because that is irrelevant to the question. Currently, GPU is considered an auxiliary processing unit, that's why moving data from/to a GPU is necessary and expensive. If we treat GPU as the first class processing unit, there won't be any need to marshal data between the main memory and the GPU memory. – Lie Ryan – 2015-02-27T12:51:09.057

Oh ok, so a GPU on-board a CPU will have zero bandwidth overhead moving data between the two. That's optimistic :) – Dr. Andrew Burnett-Thompson – 2015-02-27T15:59:13.920

1Not optimistic, it's not zero bandwidth overhead. If a processor with a GPU architecture runs the entire show, there is nothing that needs to be moved, the GPU memory is the main memory. There is no transfer overhead to be talked about in the first place because there is no transfers. This is not hypothetical by the way, AMD's APUs uses HSA (heterogenous system architecture) with unified main memory which allows zero-copying between the CPU and GPU. – Lie Ryan – 2015-02-27T17:40:22.100

It's generally off topic to talk about VRAM routing tech; if a standard decided to use a GPU instead of a CPU, its memory would not be passing though PCI-E. The same is true for the reverse. – j riv – 2018-07-16T13:06:43.543

23This is probably the best answer here. It is important to understand the fundamental differences between the two paradigms. For GPUs to overtake CPUs, considering today's workloads, essentially means a GPU must turn into a CPU. And thus the question is the answer. – surfasb – 2011-07-11T09:22:45.483

2+1 for this being the best answer. Both this and the accepted answer are correct, but this one explains it much more clearly. – None – 2011-07-11T16:09:49.930

GPUs lack:

Virtual memory (!!!)
Means of addressing devices other than memory (e.g. keyboards, printers, secondary storage, etc)
Interrupts

You need these to be able to implement anything like a modern operating system.

They are also (relatively) slow at double precision arithmetic (when compared with their single precision arithmetic performance)*, and are much larger (in terms of size of silicon). Older GPU architectures don't support indirect calls (through function pointers) needed for most general-purpose programming, and more recent architectures that do do so slowly. Finally, (as other answers have noted), for tasks which cannot be parallelized, GPUs lose in comparison to CPUs given the same workload.

EDIT: Please note that this response was written in 2011 -- GPU tech is an area changing constantly. Things could be very different depending on when you're reading this :P

* Some GPUs aren't slow at double precision arithmetic, such as NVidia's Quadro or Tesla lines (Fermi generation or newer), or AMD's FirePro line (GCN generation or newer). But these aren't in most consumers' machines.

Billy ONeal

Posted 2011-07-10T13:31:41.137

Reputation: 7 021

@Cicada: Do you have a reference for that? In any case, even if that is true, even recent hardware is not going to perform well in that case. (e.g. would not have too much a perf advantage over a CPU -- and a power consumption DISadvantage) – Billy ONeal – 2011-07-11T14:56:15.083

3Yes, the Fermi devices as you said (with CUDA 4.0 and sm_20), support indirect jumps (and therefore C++ virtual methods, inheritance etc). – Angry Lettuce – 2011-07-11T15:04:25.247

544 GigaFLOPS from a $300 2 year old GPU is slow? – Ben Voigt – 2011-09-12T04:35:03.080

2@Ben: You only get that kind of performance in data-parallel applications. General sequential operations are a whole different ballgame. (That's only with all 1600 cores on that chip running in parallel, running essentially the same instruction over and over again... and even that's theoretical rather than actual perf) – Billy ONeal – 2011-09-12T13:13:03.323

@Billy: But that's slowness on a particular class of algorithms, not slowness on double precision arithmetic (which is what you claimed). (And CPUs usually don't achieve benchmark throughputs either) – Ben Voigt – 2011-09-14T12:26:11.167

@Ben: Your linked article doesn't say anything about double precision. The advertised FLOPs are for single precision. Double precision operations on most GPUs nowadays are at least one sixth the speed of single precision (the notable exception being the Quadro and Tesla Fermi devices) I never said CPUs achieved benchmark throughput, so I'm not sure what your point is there. – Billy ONeal – 2011-09-14T19:20:27.113

@Ben: Sorry, I see the double precision comment now. In any case, when I said "slow with double precision" I was talking about in comparison to single precision, not in comparison to CPUs. – Billy ONeal – 2011-09-14T19:23:28.640

@Billy: But this whole question is about GPU vs CPU, not double precision vs single precision (also, CPUs see similar speed differences between single and double precision on many operations). – Ben Voigt – 2011-09-14T20:06:34.840

A CPU is like a worker that goes super fast. A GPU is like a group of clone workers that go fast, but which all have to do exactly the same thing in unison (with the exception that you can have some clones sit idle if you want)

Which would you rather have as your fellow developer, one super fast guy, or 100 fast clones that are not actually as fast, but all have to perform the same actions simultaneously?

For some actions, the clones are pretty good e.g. sweep the floor - they can each sweep a part of it.

For some actions, the clones stink, e.g. write the weekly report - all the clones but one sit idle while one clone writes the report (otherwise you just get 100 copies of the same report).

John Robertson

Posted 2011-07-10T13:31:41.137

Reputation: 999

3Great analogy. Will remember this. – Mayo – 2015-06-25T14:54:57.220

4Could I even have ... both? – Kevin Panko – 2011-07-12T21:18:30.770

23@Kevin: Yes, but you'd need a computer with both a CPU and a GPU! If only there were such a thing! – Joachim Sauer – 2011-07-14T07:37:18.563

Because GPUs are designed to do a lot of small things at once, and CPUs are designed to do a one thing at a time. If your process can be made massively parallel, like hashing, the GPU is orders of magnitude faster, otherwise it won't be.

Your CPU can compute a hash much, much faster than your GPU can - but the time it takes your CPU to do it, your GPU could be part way through several hundred hashes. GPUs are designed to do a lot of things at the same time, and CPUs are designed to do one thing at a time, but very fast.

The problem is that CPUs and GPUs are very different solutions to very different problems, there is a little overlap but generally what's in their domain stays in their domain. We can't replace the CPU with a GPU because the CPU is sitting there doing its job much better than a GPU ever could, simply because a GPU isn't designed to do the job, and a CPU is.

A minor side note, though, if it were possible to scrap the CPU and only have a GPU, don't you think we'd rename it? :)

Phoshi

Posted 2011-07-10T13:31:41.137

Reputation: 22 001

I think most modern CPUs are designed to do 2, 4, or 8 things at once. – danielcg – 2013-02-11T06:07:07.363

@danielcg25: And most modern GPUs are designed to do 256, 512, 1024 things at once (The GTX 680 has 1536 CUDA cores). Each individual CPU core is a distinct entity conceptually, but this is not true of a GPU. – Phoshi – 2013-02-11T13:04:16.043

@danielcg25: I'm aware, but a comment with a fundamental (albeit intentional) misunderstanding of the answer could be harmful if anybody was reading it without already knowing the topic. "Being an ass" in that sense isn't really appreciated on SE as it lowers the signal:noise ratio. – Phoshi – 2013-02-12T14:23:13.783

I was just providing some information. Most computers nowadays actually are capable of processing 2-8 things at once. Some processors can do even more than that. It still doesn't come close to GPUs which do 100s of things at once. – danielcg – 2013-02-13T03:13:11.517

@danielcg25: It's a different kind of processing, though, which is what the question is about. Each CPU core is effectively separate, working with its own chunks of data and its own processes. Each CPU core performs a different, separate task to every other, and they do not scale upwards linearly--an octo-core is not twice as useful as a quad core is not twice as useful as a dual core. GPU cores, on the other hand, perform the same task across different pieces of data, and do scale linearly. It is obvious that multi-core CPUs exist, but this is not the same thing. – Phoshi – 2013-02-13T13:50:23.643

Are you really asking why are we not using GPU like architectures in CPU?

GPU is just a specialized CPU of a graphics card. We lend GPU non graphics computation because general purpose CPU are just not up to par in parallel and floating point execution.

We actually are using different (more GPU-ish) CPU architectures. E.g. Niagara processors are quite multitasking. SPARC T3 will run 512 concurrent threads.

jkj

Posted 2011-07-10T13:31:41.137

Reputation: 512

Why a downvote? – jkj – 2011-07-10T21:01:34.773

3i guess the last line, as it's simply false. In fact, i can only think of one x86-only mainstream OS; and even that one has been ported to alpha and ARM processors, just not commercially offered at the moment. – Javier – 2011-07-11T03:27:58.500

Ok. Removed the last section that was my opinion about mainstream operating system support hindering change to new architectures. Might not be in the scope of the answer. – jkj – 2011-07-11T07:11:19.680

I might be horribly mistaken here, and am speaking from little or no authority on the subject, but here goes:

I believe each GPU execution units ("core") have a very limited address space compared to a CPU.
GPU execution units can't deal with branching efficiently.
GPU execution units don't support hardware interrupts in the same way CPUs do.

I've always thought the way GPU execution units were meant to be is something like the Playstation 3 "SPEs", they want to be given a block of data, run a number of sequential operations on it, and then spit out another block of data, rinse, repeat. They do not have as much addressable memory as the main "CPE" but the idea is to dedicate each "SPE" to a specific, sequential task. The output of one unit might feed the input of another unit.

The execution units don't work well if they are trying to "analyze" the data and make a bunch of decisions based on what that data is.

These "blocks of data" can be part of a stream, such as a list of vertices from a game's state table, MPEG data from a disk, etc.

If something does not fit this "streaming" model then you have a task which cannot be efficiently paralellized and the GPU is not necessarily the best solution for it. A good example is processing "external event" based things like keyboard, joystick, or network input. There aren't a lot of things that don't fit that model, but there will always be a few.

LawrenceC

Posted 2011-07-10T13:31:41.137

Reputation: 63 487

Good point about the branch prediction optimization - I would have never considered that, but you're right. – Jimmy Breck-McKye – 2013-09-14T21:05:07.533

This is nothing about clock speed or purpose. They are both equally able to complete most, if not all tasks; however some are slightly better suited for some tasks then others.

There has been a very old argument about whether it's better to have lots of dumb cores or a small group of very smart cores. This goes back easily into the 80's.

Inside a CPU there are many possible calculations that can be done. The smarter cores are able to carry out many different calculations at the same time (kind of like multi-core but not, it's complicated; see Instruction-level parallelism). A smart core could do several calculations at the same time (add, subtract, multiply, divide, memory operation) but only one at a time; because of this, they are physically larger (and therefore much more expensive) then dumber cores.

A dumb core is much smaller and therefore more can be added to a single chip but are not able to do as many simultaneous calculations. There is a fine balance between many dumb cores and a few smart cores.

Multi-core architectures work well with graphics because the calculations can easily be split up over hundreds of cores, but it is also dependent on the quality of code and whether other code is relying on the result of one calculation.

This is a much more complicated question than it may appear. For more info, read this article about CPU design:

Modern Microprocessors - A 90 Minute guide

http://www.lighterra.com/papers/modernmicroprocessors/

Silverfire

Posted 2011-07-10T13:31:41.137

Reputation: 191

please excuse the poor grammar and in general sub-par writing style used in the above, i have not had my coffee. its a fairly complicated concept and the included link is where you should go if you want to understand more. not my bad explanation – Silverfire – 2011-07-12T04:39:45.980

2I've fixed it for you, and added a link as well. – bwDraco – 2011-07-12T17:22:38.930

I would like to broach one Syntactic point: The terms CPU and GPU are functional names not architectural names.

If a computer were to use a GPU as its main processor, it would then would become a "central processing unit" (CPU) regardless of the architectural and design.

Andrew Neely

Posted 2011-07-10T13:31:41.137

Reputation: 339

It is important to keep in mind that there is no magical dividing line in the architecture space that makes one processor the "central" one and another the "graphics" one. (Well, some GPUs may be too crippled to be fully general, but those are not the ones we are are talking about here.)

The distinction is one of how they are installed on the board and what tasks are given to them. Of course, we use a general-purpose processors (or set of general-purpose processors) for the main data mover, and a special, parallelized, deeply pipe-lined unit for things (like graphics) to can best take advantage of them.

Most of the spiffy tricks that have been used to make GPUs do their thing very fast were first developed by people trying to make faster and better CPUs. In turns out that Word and Excel and Netscape and many other things that people use their computers for not only don't take full advantage of the features offered by graphics specialized chips but even run slower on those architectures because branch a lot cause (very expensive and slow) pipe-line clears.

dmckee --- ex-moderator kitten

Posted 2011-07-10T13:31:41.137

Reputation: 7 311

1I think pipeline overhead is a fundamental detail the higher ranked answers are missing. – Steve – 2015-04-09T23:08:14.127

The whole point of there being a GPU at all was to relief the CPU from expensive graphics calculations that it was doing at the time.
By combining them to a single processor again would be going back to where all started.

Petruza

Posted 2011-07-10T13:31:41.137

Reputation: 3 043

Yup, one step forward, two steps back. – Randolf Richardson – 2011-08-12T19:58:30.053

For a simple reason: most applications are not multi-threaded/vectorized.

Graphic cards heavily rely on multi threading, at least in the concept.

Compare a car with a single engine, an a car with one smaller engine per wheel. With the latter car, you need to command all the engines, something which has not been taken into account for a system programming point of view.

With AMD fusion though, it will change how we will need to make use of the processing power: either vectorized, either fast for one thread.

jokoon

Posted 2011-07-10T13:31:41.137

Reputation: 375

The reason we are still using CPUs is that both CPUs and GPUs have their unique advantages. See my following paper, accepted in ACM Computing Surveys 2015, which provides conclusive and comprehensive discussion on moving away from 'CPU vs GPU debate' to 'CPU-GPU collaborative computing'.

A Survey of CPU-GPU Heterogeneous Computing Techniques

user984260

Posted 2011-07-10T13:31:41.137

Reputation: 189

If to put simply GPU can be compared to trailer in the car. As usually trunk is enough for majority of people except for cases if they buy something really big. Then they can need trailer. The same with GPU, as usually it is enough to have ordinary CPU which will accomplish majority of tasks. But if you need some intensive calculations in many threads, then you can need GPU

Yuriy Zaletskyy

Posted 2011-07-10T13:31:41.137

Reputation: 217

gpus are good stream processors. you can think of stream processing as multiplying a long array of numbers sequentially. cpus also have stream processing capabilities (it's called SIMD extensions) but you can't implement all programming logic as stream processing, and compilers have the option to create btyecode which meakes use of simd instructions whenever possible.

not everything is an array of numbers. pictures and videos are, maybe sound too(there are opencl encoders here and there). so gpus can process, encode and decode pictures, videos and anything similar. one drawback is that you can't offload everything to gpus in games because it would create stutter, gpus are busy with graphics and are supposed to be the bottleneck in the system when playing games. the optimal solution would be fully utilizing all components in a pc. so, for example, nvidia's physx engine, by default, does calculations on the cpu when the gpu is fully utilized.

Uğur Gümüşhan

Posted 2011-07-10T13:31:41.137

Reputation: 1 216