The situation can be confused, so let's set things right.
RSA is two algorithms, one for asymmetric encryption, and one for digital signatures. These are two distinct beast; although they share the same core mathematical operation and format for keys, they do different things in different ways. Diffie-Hellman is a key exchange algorithm, which is yet another kind of algorithm. Since the algorithms don't do the same thing, you could prefer one over the other depending on the usage context.
Asymmetric encryption and key exchange are somewhat equivalent: with asymmetric encryption, you can do a key exchange by virtue of generating a random symmetric key (a bunch of random bytes) and encrypting that with the recipient's public key. Conversely, you can do asymmetric encryption with key exchange by using the key resulting from the key exchange to encrypt data with a symmetric algorithm, e.g. AES. Moreover, Diffie-Hellman is a one-roundtrip key exchange algorithm: recipient sends his half ("DH public key"), sender computes his half, obtains the key, encrypts, sends the whole lot to the recipient, the recipient computes the key, decrypts. This is compatible with a one-shot communication system, assuming a pre-distribution of the public key, i.e. it works with emails.
So for the rest of this answer, I assume we are talking about RSA encryption.
Perfect Forward Secrecy is a nifty characteristic which can be summarized as: actual encryption is done with a key which we do not keep around, thus immune to ulterior theft. This works only in a setup in which we do not want to keep the data encrypted, i.e. not for emails (the email should remain encrypted in the mailbox), but for data transfer like SSL/TLS.
In that case, to get PFS, you need to generate a transient key pair (asymmetric encryption or key exchange) for the actual encryption; since you usually also want some sort of authentication, you may need another non-transient key pair at least on one side. This is what happens in SSL with the "DHE" cipher suites: client and server use DH for the key exchange, with newly generated DH keys (not stored), but the server also needs a permanent key pair for signatures (of type RSA, DSA, ECDSA...).
There is nothing which intrinsically prohibits generating a transient RSA key pair. Indeed, this was supported in older versions of SSL; see TLS 1.0, section 7.4.3. In that case, use of an ephemeral RSA key was mandated not for PFS, but quite the opposite: so that encryption keys, while not stored, could be broken afterwards, even if the server's permanent key was too large to be thus brutalized.
There is, however, an advantage of DH over RSA for generating ephemeral keys: producing a new DH key pair is extremely fast (provided that some "DH parameters", i.e. the group into which DH is computed, are reused, which does not entail extra risks, as far as we know). This is not a really strong issue for big servers, because a very busy SSL server could generate a new "ephemeral" RSA key pair every ten seconds for a very small fraction of his computing power, and keep it in RAM only, and for only ten seconds, which would be PFSish enough.
Nevertheless, ephemeral RSA has fallen out of fashion, and, more importantly, out of standardization. In the context of SSL, if you want PFS, you need to use ephemeral DH (aka "DHE"), because that's what is defined and supported by existing implementations.
If you do not want PFS, in particular if you want to be able to eavesdrop on your own connections or the connections of your wards (in the context of a sysadmin protecting his users through some filters, or for some debug activities), you need non-ephemeral keys. There again, RSA and DH can be used. However, still in the context of SSL, non-ephemeral DH requires that the server's key, in its X.509 certificate, contains a DH public key.
DH public keys in certificates were pushed by the US federal government back in the days when RSA was patented. But these days are long gone. Moreover, DH support was never as wide as RSA support. This is indeed an interesting example: DH was government approved, and standardized by an institutional body (as ANSI X9.42); on the other hand, RSA was standardized by a private company who was not officially entitled in any way to produce standards. But the RSA standard (PKCS#1) was free for anyone to read, and though there was a patent, it was valid only in the USA, not the rest of the world; and in the USA, RSA (the company) distributed a free implementation of the algorithm (free as long as it was for non-commercial usages). Amateur developers, including Phil Zimmerman for PGP, thus used RSA, not DH. The price of the standard is nothing for a company, but it can mean a lot for an individual. This demonstrates the impetus that can originate, in the software industry, from amateurs.
So that's one advantage of RSA over DH: standard is freely available.
For security, RSA relies (more or less) on the difficulty of integer factorization, while DH relies (more or less) on the difficulty of discrete logarithm. They are distinct problems. It so happens that the best known breaking algorithms for breaking either are variants of the General Number Field Sieve, so they both have the same asymptotic complexity. From a high-level view, a 1024-bit DH key is as robust against cryptanalysis as a 1024-bit RSA key.
If you look at the details, though, you may note that the last part of GNFS, the "linear algebra" part, which is the bottleneck in the case of large keys, is simpler in the case of RSA. That part is about reducing a terrifyingly large matrix. In the case of RSA, the matrix elements are just bits (we work in GF(2)), whereas for DH the matrix elements are integer modulo the big prime p. This means that the matrix is one thousand times bigger for DH than for RSA. Since matrix size is the bottleneck, we could state that DH-1024 is stronger than RSA-1024.
So that's one more advantage of DH: it can be argued that it gives some extra robustness over RSA keys of the same size.
Still for security, DH generalizes over other groups, such as elliptic curves. Discrete logarithm on elliptic curves is not the same problem as discrete logarithm modulo a big prime; GNFS does not apply. So there is not one Diffie-Hellman, but several algorithms. "Cryptodiversity" is a good thing to have because it enables us to switch algorithms in case some researcher finds a way to easily break some algorithms.
As for performance:
- RSA encryption (with the public key) is substantially cheaper (thus faster) than any DH operation (even with elliptic curves).
- RSA decryption (with the private key) entails more or less the same amount of work as DH key exchange with similar resistance. DH is a bit cheaper if it uses a permanent key pair, but a bit more expensive if you include the cost for building an ephemeral key pair.
- In the case of SSL and DHE_RSA, the server must generate a DH key pair and sign it, and the signature includes the client and server random values, so this must be done for each connection. So choosing "DHE_RSA" instead of "RSA" kind-of doubles the CPU bill on the server for SSL -- not that it matters much in practice, though. It takes a very busy server to notice the difference.
- A DH public key is bigger to encode than a RSA public key, if the DH key includes the DH parameters; it is smaller otherwise. In the case of SSL, using DHE_RSA instead of RSA means exchanging one or two extra kilobytes of data -- there again, only once per client (because of SSL session reuse), so that's hardly a crucial point. In some specialized protocols, ECDH (with elliptic curves) gets an important edge because the public elements are much smaller.
If you are designing a protocol in a constrained situation (e.g. involving smart cards and I/O over infrared or anything similarly low-powered), ECDH will probably be more attractive than RSA.
Summary: you will usually prefer RSA over DH, or DH over RSA, based on interoperability constraints: one will be more supported than the other, depending on the context. Performance rarely matters (at least not as much as is often assumed). For SSL, you'll want DH because it is actually DHE, and the "E" (as ephemeral) is nice to have, because of PFS.