In my work with online orders, I started noticing an extreme abnormality in a few orders. In one field that wasn't restricted there appeared a string of over 3 million characters that were totally gibberish consisting mostly of Cyrillic characters. On closer examination using Python, it turned out it was actually a list of over a thousand of such gibberish strings. I dug deeper and found more instances of that, the worst with a string of over 58 million characters consisting of over 18000 list elements.
So we have a string that consists of several lists of strings, those strings again consist of several gibberish words separated by non-breaking spaces.
An example (I added linebreaks for readability):
'Р В Р’ВР
’ Р В РІР‚в„ўР вР
‚™Р’В Р В Р’В Р Р
 вЂ Р В РІР‚љРІвЂћСћР Р
’ РІР‚™Р’ВР
’ Р В Р’ Р’РВ
’ Р Р†Р РР
†Р вЂљРЎв„ўР В Р вЂ Р Р†Р вЂљРЎвЂєР Р
ЋРЎвЂєР В Р’ Р’ РІРР
ІР‚љРІвЂћСћР В РІРВ
‚™Р’В РРвЂ
The following is a count of the 10 most common words in the 58 million character string:
Р 2453256
В 1926812
Р’В 895699
’В 822674
ІР399677
РІР‚в„ўР 382349
†235180
‚Р185503
‚в„ўР177792
†109266
ІвЂћСћР101490
Now take e.g. the string "РІР‚в„ўР" and put it into google. I'm getting over a million seemingly random sites where those strings are inserted into the source code of the sites.
I have absolutely no idea what to make of this, does anyone know what this is?