What type of distributed/parallel computing infrastructure you build depends a lot on the problem being worked on. The easiest workloads to distribute are those that are easily subdivide-able: carve the problem-set into 4 chunks, farm the chunks to 4 machines, stitch the results back together once processing is done. Workloads that are poor choices for subdivision are those that have strong dependence upon previously or currently processing data.
For data that can't subdivide, your best bet is to look into some of the single-system-image frameworks out there (see link for a list). These cause multiple systems to emulate a single larger system. Even then care must be taken to design processing in such a way as to minimize inter-system communication. Systems like these are where networking products such as Infiniband are really useful.
For data that can subdivide, you have a lot more options. The largest is perhaps BOINC, which is designed around very high latency workunit reporting (hours, days, or even weeks). I've heard of private BOINC clusters out there.
One I used back in college is PVM. This is a C-library (a perl wrapper exists, which is new) that enables inter-system communication over a variety of transports.
Whatever you pick, you'll still have to redesign how your computation framework functions. It'll be a lot of work, but at least you can use more resources to solve your problems. It is vanishingly unlikely that you can just drop your existing code into a distributed computing framework and have it all work, just getting the distributed framework up and running will be a challenge.