There are two main factors.
First, you are quite right that RAM is the biggest one. Because a GPU has to share RAM bandwidth with the CPU, it simply cannot use nearly as much RAM. Worse, it is using RAM that is not optimized for GPU use, so the CPU, GPU, and RAMDAC all fight for the same precious bandwidth and the path between the GPU and RAM is much less direct.
Second, a dedicated GPU can have more compute units. You can only fit so many transistors on a single die, and a dedicated GPU can devote more space to GPU computing units.
I'm not sure what you mean by "less latency". If you think that it means communication between the CPU and GPU is more efficient, it basically isn't. Modern graphics cards have a great path that allows the CPU to write directly into the GPU (and its RAM) through fast buffers. A dedicated GPU has more room for these kinds of buffers because it's not sharing die space with the CPU and its caches.
Lacking GPU RAM, integrated solutions typically require "bulk" CPU/GPU communication to go through the regular RAM which is less efficient. The CPU can't give bulk data directly to the GPU. That would require them to run in lockstep which would waste resources because they're never exactly the same speed. And what could the GPU do with such bulk data other than write it to RAM? It's not like it has anyplace else to keep it while it processes it.
CPU to GPU communication basically involves writing the information to be communicated some place where both components can get it and then telling the GPU to process the information. With an integrated solution, that has to be the regular RAM which is already the limiting factor. With a dedicated solution, that can be the GPU's RAM, which is much more efficient.
"with the GPU built into the chip that means less latency so better performance" Huh? Less latency between the CPU and GPU? Why would that matter? It's not like one waits for the other. – David Schwartz – 2014-04-25T04:18:24.330