What makes the GPU so much faster than the CPU?
The GPU is not faster than the CPU. CPU and GPU are designed with two different goals, with different trade-offs, so they have different performance characteristic. Certain tasks are faster in a CPU while other tasks are faster computed in a GPU. The CPU excels at doing complex manipulations to a small set of data, the GPU excels at doing simple manipulations to a large set of data.
The GPU is a special-purpose CPU, designed so that a single instruction works over a large block of data (SIMD/Single Instruction Multiple Data), all of them applying the same operation. Working in blocks of data is certainly more efficient than working with a single cell at a time because there is a much reduced overhead in decoding the instructions, however working in large blocks means there are more parallel working units, so it uses much much more transistors to implement a single GPU instruction (causing physical size constraint, using more energy, and producing more heat).
The CPU is designed to execute a single instruction on a single datum as quickly as possible. Since it only need to work with a single datum, the number of transistors that is required to implement a single instruction is much less so a CPU can afford to have a larger instruction set, a more complex ALU, a better branch prediction, better virtualized architecture, and a more sophisticated caching/pipeline schemes. Its instruction cycles is also faster.
The reason why we are still using CPU is not because x86 is the king of CPU architecture and Windows is written for x86, the reason why we are still using CPU is because the kind of tasks that an OS needs to do, i.e. making decisions, is run more efficiently on a CPU architecture. An OS needs to look at 100s of different types of data and make various decisions which all depends on each other; this kind of job does not easily parallelizes, at least not into an SIMD architecture.
In the future, what we will see is a convergence between the CPU and GPU architecture as CPU acquires the capability to work over blocks of data, e.g. SSE. Also, as manufacturing technology improves and chips gets smaller, the GPU can afford to implement more complex instructions.
@vartec as a seasoned CUDA developer, I think that may be the most accurate analogy I have ever seen, hands down. I'm saving that one :) – Breakthrough – 2014-07-12T07:11:57.933
5@vartec: I think a slightly better analogy might be between buses and taxicabs. If there are forty people who all want to go from the same place to the same place, a bus will be much more efficient. If there are forty people whose desired origins and destinations are widely scattered, even a single taxicab may be just as good as a bus, and for the cost of the bus one could have multiple taxicabs. – supercat – 2015-01-19T21:39:01.893
As with all important technical questions, Mybusters have addressed this (And it's not a bad analogy)
– Basic – 2015-06-06T14:57:43.933Related: The difference between GPU and CPU
– Ƭᴇcʜιᴇ007 – 2016-01-03T18:14:42.0403how do I know which answers contain correct information? Should I wait till others up/down vote answers? I think I was too hasty in accepting an answer :O – ell – 2011-07-10T17:29:04.707
14There are some recent answers @ell now, which do not contain "misinformation". They are gradually rising to the top with up votes due to efficient market mechanism of the wonderfully designed StackExchange ;-) I'd suggest waiting a little longer before accepting an answer. Looks like you very prudently are doing just that. This is a good question, by the way. Might seem obvious, but it isn't at all. Thank you for asking it! – Ellie Kesselman – 2011-07-10T20:43:01.033
There's no reason why one couldn't, eg, create a Java JITC for a GPU, from a code-generation point of view. And most OS code is now written in C/C++ which can be easily retargeted. So one is not tied to the x86 heritage in any really significant way (unless you're running Windoze). The problem is that few (if any) GPUs are at all good at general-purpose processing. – Daniel R Hicks – 2011-07-10T20:48:24.130
@DanH Except that Java is a bad language, specifically for creating programs which have a high level of parallelism. We need mainstream languages, like for functional programming, where parallelism is the natural way of expressing any program -- futher more the programming languages have to be well suited to operate on very small amount of memory for each unit of computation, as that is when the GPU operates efficient. As mentioned in the in the question, there are only few problems such as AI and the like which does this naturally without a new programming language – Soren – 2011-07-10T21:40:23.960
But you don't need to run Java. The point is that you're not chained to a processor architecture. As to a new language for parallel processing, people have been trying to invent one for maybe 30 years now, and not made significant progress. Whereas after 30 years of developing sequential programming languages we had Fortran, COBOL, Modula-2, C, Pascal, Ada, PL/I, C++, and a host of others. – Daniel R Hicks – 2011-07-11T04:03:23.270
1
Related question from Stack Overflow: Why aren't we programming on the GPU?
– Kobi – 2011-07-11T04:21:20.197128Kind of like asking "If Boeing 747 is faster and more fuel efficient, why do we still drive cars"? – vartec – 2011-07-11T09:35:10.770
Does this sound familiar (RISC vs. CISC) ? – Aki – 2011-07-12T06:14:16.013
8No, because it's not RISC versus CISC. It's one of the other computer science fundamentals, slightly disguised. It's "Why do we offload work from the central processor onto I/O processors?". – JdeBP – 2011-07-12T12:26:19.997