I thought I would post a followup for others to try to consolidate the information into a single post (based on what I have learnt and the other information already posted).
PAE:
PAE will allow a 32bit Windows server to make use or more than 4GB RAM, with the maximum being dependant on the version of windows you are running (Wikipedia has a nice reference here)
One thing to note, if you have Data Execution Prevention (DEP) or NoExecution (NX) turned on, then this will effectivly enable PAE without having to explicitly enable it in the boot.ini.
Bottom line, PAE has no effect of the amount of memory a single 32bit process can access. It only affects the total amount of memory windows can 'see' and make use of (so you can have 2 processes each using 2GB, with Windows using 2GB on a 6GB system)
3GB:
Firstly, when I am speaking about the memory available to single 32bit process, I am refering to the processes virtual address space. For a 32bit process on a 32bit Windows OS this is limited to 4GB.
On a system without the /3GB switch, the 4GB of virtual address space is split 2GB / 2GB between the running process and the Windows kernel.
When you enable the 3GB switch, the split in virtual address space is changed to 3GB / 1GB. This allows the process to use more memory, at the expense of the kernel memory. Please note: Windows will only allow a process to use more the 2GB or memory if the executable has been compiled with the IMAGEFILELARGEADDRESSAWARE flag set.
Now, as has been mentioned in other posts, the penalty of using the 3GB switch is that the kernel has less memory to work with. And one of the main casualties of the reduced memory is the number of Page Table Entries (PTEs) available. A page table is the data structure used by the Windows Virtual Memory Manager to store the mapping between virtual addresses and physical addresses in memory. If you have insufficient free PTEs, then windows may fail to allocate memory to a process when requested, even if the process has not yet exhausted its address space.
The free PTE count can be measured using perfmon (\Memory\Free System Page Table Entries). Anything under 5000 is considered critical by Microsoft. As an example, on the servers mentioned in the original post, without the 3GB switch and with the process running, the free PTE count was around 160k. After enabling 3GB but before the process had started, windows was reporting 3.5k free PTEs (a dramatic reduction). This number would have dropped quickly if we had started the process.
The way to compensate for this dramatic change is to enable the USERVA switch in the boot.ini. By setting USERVA=2800, this moves the 3GB / 1GB split in memory and 'gives back' approximately 250MB back to the kernel for its use. By way of example, after setting USERVA=2800 in the boot.ini on our system, the free PTE count now sits around 60k with the process running (a lot better than the 3.5k we were seeing).
More information on the USERVA switch can be found in the Microsoft KB article.
It is also worth mentioning that enabling PAE also has an impact on the free PTE count. The PAE switch causes each PTE entry to use twice the normal allotted virtual address space.
Hopefully this provides a nice concise summary of the information for anyone looking at a later date.
Cheers
Sam