I imagine that if someone suspected an anonymous online persona of being a specific real person who also had writing samples available online (like a blog or social media) they'd be able to compare the styles of writing. What are some ways to prevent this? Should techniques be done on your anonymous account or your identifiable one?
-
If you are going to use writing style obfuscation use it on a (ideally newly created) anonymous account and never use those techniques for any other account (especially your public profile). Spinbot is one simple scripted example of obfuscation of writing style – DarkMatter Nov 29 '18 at 20:21
-
One thing you can do to hide yourself is to add misleading clues by copying someone else's writing style and including details that imply a different background. (Consider how Satoshi Nakamoto used a Japanese pseudonym. In my opinion, he's probably not Japanese, but it probably distracted a lot of amateur investigators.) – Macil Nov 29 '18 at 22:43
-
There are a number of research papers on anonymizing writeprint (unique writing style) to avoid stylometric analysis. If I recall, one common conclusion is that writeprint actually drifts over a period of years, so documents you have made from 5 years ago are going to be hard to link to documents you make today. – forest Nov 30 '18 at 01:30
-
Make sure not to use the word "lodestar", unless you want to frame the vice president. – Anders Dec 03 '18 at 12:38
-
Develop multiple personalities and you will write different for each of them. – Overmind Feb 12 '19 at 09:02
3 Answers
There is a program written in Java called Anonymouth which assists with this:
Anonymouth is a Java-based application that aims to give users to tools and knowledge needed to begin anonymizing documents they have written. It does this by firing up JStylo libraries (an author detection application also develped by PSAL) to detect stylometric patterns and determine features (like word length, bigrams, trigrams, etc.) that the user should remove/add to help obsure their style and identity.
On its own, Anonymouth is only of limited value. Anonymizing your writeprint analysis requires a basic understanding of linguistics and stylometry. Basic stylometry involves the so-called 5-feature analysis, where five major writing style features are analyzed (paraphrased from Wikipedia):
lexical features - The analysis of the lexicon, the author's choice of vocabulary. Different people use different words at different rates, which can make them quite unique. I, for example, tend to use the word tend a lot. I would need to avoid that if I wanted to hide my writeprint. Using simple, short, and common words can reduce the potential of this feature.
syntactic features - The analysis of the author's writing style and sentence structure, such as: punctuation, use of passive voice, and sentence complexity. Using sentences that are as simple as possible with a standard writing style can help weaken this feature.
structural features - The analysis of the author's organization of the work. These include paragraph length, spacing, indentation, use of oxford commas, etc. Just as with making the 2nd feature less useful, making this feature less useful can be done by following standard writing styles rather than using one that naturally evolved with you.
content-specific features - The analysis of the language that is contextually significant to subject of the written work. Examples include the use of slang or acronyms that may be shibboleths. E.g. a set of botnet owners can easily be divided into those who say "C&C", "CnC", and "C2".
idiosyncratic features - The analysis of errors and other ungrammatical elements that may be unique to the author. This is by far one of the most damning features and has lead to numerous people being deanonymized by their writing style. Small mistakes made on one non-anonymous identity can carry on to anonymous identities, potentially linking them.
Many of these features can be anonymized by using standard English with completely proper grammar and spelling, and short, simple sentences. Idioms and the like should be avoided. It's also possible to emulate other authors in order to fool analysis. If all else fails, simply waiting can help. A person's writing style drifts over the years, so it's unlikely that a document authored by you several years ago will be easy to tie to what you write now. This does not mean it is impossible, just harder.
- 64,616
- 20
- 206
- 257
It would take a lot of writing samples for this, but it is theoretically possible to some degree of certainty (I doubt it would be admissible in court, for example, but they could convince themselves at least). Preventing this would require you to intentionally write completely differently than you normally would (or use a script that makes your writing look different, as @DarkMatter mentioned). Trying to emulate the writing style of an author you like might be one way of manually doing this (it is certainly possible and common for writers to emulate the writing style of H.P. Lovecraft, for example).
You should definitely alter the writing of your anonymous persona since you have undoubtedly been writing since a very early age, thus a lot of data exists tied to your real identity.
- 103
- 3
-
It in fact was used to arrest a child pornography website operator from Australia. If I recall, he had used some rare idiosyncratic phrase both on his illicit website, and on public forums attached to his real name. – forest Nov 30 '18 at 01:17
A couple of rounds of Google translate seems like a good bet. E.g. plaintext-->chinese-->english-->german-->italian-->English.
You would probably lose a lot of meaning, too, due to imperfect translation at each step, but you could proofread to make sure the essential message is the same.
- 1,434
- 1
- 12
- 16
-
2Depending on your threat model, this may actually be worse, since you would be giving the original, unedited copy to Google. – forest Nov 30 '18 at 01:08
-
@forest Fair point. I think the principle still stands though, if you replace "Google translate" with "generic, locally-hosted translation software" – loneboat Nov 30 '18 at 01:10
-
But of the five main writeprint features, that would only anonymize lexical features and, to some limited extent, syntactic features. It wouldn't help with the others (which are important). – forest Nov 30 '18 at 01:15
-
More good points. Look, I didn't say in my answer "this is a surefire, bulletproof method." Was just throwing it out there as an idea. I see your own answer in here is much more lengthy and well thought-out. Just let the community's upvotes/downvotes do their job now. – loneboat Nov 30 '18 at 01:27
-
Oh I'm not criticizing you, just pointing out a potential issue with an otherwise good idea. – forest Nov 30 '18 at 01:28
-
No worries. Just feels like you were camping on my answer wanting a fight. – loneboat Nov 30 '18 at 01:30