It seems to me you're discussing two things: sandboxing on the desktop and then strategies for user content access in sandboxed applications.
Sandboxing
There are many sandboxing models out there, including the ones used by OSes:
- Windows 8 WinRT Store Apps
- OS X Sandboxed Store Apps
- iOS Apps
- Android Apps
Some apps are shipped sandboxed, for instance Chromium (using seccomp2/BPF on Linux, and integrity levels on Windows), Acrobat Reader and Microsoft Office (via the Protected View, see below), etc. And tools that force specific apps to be sandboxed, such as Sandboxie (Windows), mbox (Linux), etc. Add to that a plethora of research prototypes that are more or less functional...
Sandboxes essentially limit the ability of apps to do IPC, access user data, inspect other processes, perform system calls that might be dangerous, have access to the system, and limit a couple of extra APIs such as access to device capabilities. They almost always rest on system call interface interposition (seccomp/BPF/Capsicum), building/linking against specific APIs (e.g., WinRT) or mandatory access control (SELinux-sandbox, Android?), etc. combined with capability/permission systems for access to dangerous interfaces (device caps, some forms of IPC or file access, network access...). These capabilities can be obtained at install-time (bad: people don't always review them and usually can't revoke them e.g. Android), at runtime through prompts (bad: nobody reads them), or through trusted UI elements (good! see powerboxes, UDAC, security-by-designation, and this is now partly implemented on Windows 8, OS X and will be part of the Linux GNOME sandboxing model). Sometimes access to the file system is managed through a layered file-system so that apps don't notice the restrictions imposed on them (e.g., mbox, Linux namespaces).
This being said, the main issues with regards to sandbox performance are the ability of developers to build apps that are sandboxed, and the ability of sandboxing models to accurately limit applications to the least privileges actually needed.
For instance, on all the existing OS sandboxes (Win, OS X, iOS, Android) apps have permanent access to all user files they ever opened (e.g. futureAccessList in WinRT that allows you to reopen every single file/folder you've previously seen). This means that compromising a legitimate app potentially gives you access to all the files related to it. This is in sharp contrast with a same-origin-policy where an app would be able to access only content related to whatever it has been manipulating before: you compromising an app with e.g. a crafted file or packet should then only lead you to gaining access to the data that has been accessed in the same session or from the domain where the data was downloaded / network packet sent.
Some models close to SOP in desktop sandboxing
Windows Apps' Protected View: some apps, e.g. Office and Reader, can create restricted compartments in which they will open a file deemed suspicious. Because such compartments would not be allowed to interact with the main UI, the majority of the apps' features are disabled: the file is read-only, not always printable, and so on. Users have to click on a button in the main UI to make the file read-write and enable those features, essentially removing all protection. The assumption of this model is that:
- content rarely needs to be edited and often needs to be consulted (which is probably true in many cases)
- users will notice themselves when a malicious file is malicious, just by seeing it (which is not true at all :) )
Files are deemed suspicious if they are downloaded from a non-white-listed domain, or attached to emails from people who are not part of your contacts. This works well for some corporate users but not for all, and some users have been asking how to disable Protected View.
Content-Based Isolation: this is merely a theoretical model for a service-oriented OS developed by Microsoft Research. There's fairly little information on what's being developed and what for and it's likely this is not what you're interested in. Actually their paper provides very clear arguments why content-based isolation is due to become a hot topic and what it can be expected to provide.
Qubes OS: Qubes OS is an activity-isolation OS based on the Xen VMM. It allows users to create and name VMs in which they can run sets of apps, and merges the interfaces of the different VMs in a custom-made desktop environment. It provides very few of the typical features of a modern desktop, but has a very strong security model, especially wrt. hardware and low-level attacks on OS components.
Its main issues lie in the fact that it is modal: users have to remember at all times in which VM an app is running and make sure never to mix VMs together to avoid breaking their security model. They also have to manage syncing files between app VMs themselves. This is a typical no-no in HCI for most users, and the Qubes OS developers advertise their OS only for people who are motivated in security and willing to maintain their security model. Qubes OS is the most advanced prototype of OS isolating content rather than just apps.
More experimental models include PIGA OS which uses a SELinux-based MAC enforcement system to prevent unwanted information flows. It can be reconfigured on-the-fly with new policies thanks to a component called ContextD, assuming someome wrote policies in PIGA's own MAC language for each activity that must be available to the user. Policies cannot be updated by users and in fact users can't even see what policy is being enforced. They can only perform one activity at a time on the whole OS rather than have activities per app. PIGA OS is not usable in production, and was developed as a research prototype for a contest on OS security. I was involved in developing some tools around it, and the team was more interested in developing the technical infrastructure than the user interface. The threat model explicitly included users as adversaries.. I don't think that the project is being continued.
Finally, another research team is currently developing Bubbles for Android. If I remember well it uses Linux namespaces to isolate applications or groups of applications. It provides some UIs for managing different "social contexts" and sharing data with contacts participating to those social contexts. The model is very oriented towards collaborative information sharing and communication, which is a typical workflow of mobiles rather than desktops (where more complex workflows exist). What I regret about it is that there's no vision of how those social contexts emerge and are managed over time, which takes us to...
Why exactly is it so hard?
The main issues surrouding the identification of content that can be put together and isolated in the same container are issues related to people's awareness of the structure(s) and context(s) of their activities, how able and willing they are to make an effort to manage accurate representations of their works (and what can be provided to them in the way they perform computing to compensate for those costs), how computers can manage meaning on par with human beings, how mode can be handled in complex interfaces, how to account for the situatedness of human action and the need for users to routinely cross through security boundaries in their daily use of computers, how to handle ephemeral or emerging activities, and so on.
These are all hard, complex HCI research topics. Some relevant readings include Bardram's Activity-Based Computing project, theories such as Activity-Theory, and pretty much all of the phenomenology-inspired theories and studies in HCI (especially Situated Action and Embodied Interaction). The astonishing majority of security research (including "usable security" research) assume that users are willing or able to focus on security-related tasks and to maintain accurate models of security, and that the existing issues are all solvable with some UI fixes and better "security communication". The research they are doing relies on different theories of human action and different research methods than the ones that are needed to understand how people create and maintain meaning for what they do on computers and to understand how such a meaning can be represented within a computer and transformed into security boundaries. To the best of my knowledge there is no one else investigating the user experience and appropriation of isolation mechanisms at the moment, and I'm afraid I'm not progressing particularly fast :) I don't have any published reports/papers on the topic at the moment, but feel free to ask for clarifications/extensions if you're interested in the topic.