5

I'm required to audit various communications (email, sms, messenger, social media) for keywords relating to financial data, HIPPA, and other PII.

Is there any rational reason I should extend my current audit log support to also index (perhaps as a synonym) the various emoticons that exist?

The reason I ask is that emoticons represent special characters that will affect current compliance infrastructures in the following ways:

  1. Searching for emoticons is hard: The standard text delimiters (comma, bracket, etc) are interpreted by the on demand query engines as AND, NOT, OR, a phrase, or order of precedent
  2. Parsing emoticons is hard: The backing store may mangle an individual emoticon into more than one character, or be ignored completely
  3. Storing emoticons is hard: Spurious Tuples result in invalid joins and queries

I suppose that emoticon support might qualify as an "foreign language" except that a "character" is not just a single UTF-32 character, rather it's a "character phrase".

This requires a bit of engineering, planning, and load on the back end databases/indexes.

Question

  • Is there a line of business, or regulation that implies that special emoticon handling should be included?

Edit - because this question has both close requests and the upvotes I'm adding the following:


Examples

  • It is now illegal in many states to discriminate candidates based on criminal record. HR groups are heavy emoticon users and may use the following characters: enter image description here

  • Drug usage (PII medical) may be indicated with:

enter image description here

  • PII such as Sexual orientation, disparaging remarks could be indicated with:

enter image description here enter image description here

  • HIPPA Medial issues sent to an insurer (that denies coverage) could be indicated with

enter image description here

The examples above are only with single character expressions and certainly more creative expressions are possible.

I'm expecting strong opinions on the relevance of this question to IT compliance based on people who have interacted with millennials and those who have not.... and the demographic of 35 and older, vs 34 and younger.

My assumption is that emoticon support places an unfair burden on the business, and we would not be required to support it. Another argument is that younger businesses (in average age) who use emoticons would need to have such auditing in place.

Arguments either way is appreciated.

makerofthings7
  • 50,090
  • 54
  • 250
  • 536
  • If none of your keywords relate to emoticons then it would seem it's an unnecessary requirement? Otherwise can't it be simply treated as a word comprising a number of characters? – Andy Boura Oct 27 '15 at 17:05
  • @AndyBoura Telling a health or life insurer that the person has pre existing medical conditions can be indicated in a number of ways. Keywords may need to be augmented as illustrated above. – makerofthings7 Oct 27 '15 at 17:09
  • Yes ok, fair enough. Have you got some sort of legacy reason it's not straightforward? In most emoji systems the icon is represented by some sort of plain text character combination e.g. Cisco Messenger has coffee: c(_), cake: (^) etc... enumerating them all could be tricky though if that were required. Also these days people send pictures...so you have a bigger problem! I just tried a google reverse image search on a few of your examples and it actually did a good job of identifying and naming them which may be a way forward... – Andy Boura Oct 27 '15 at 17:16
  • 1
    You'll run into two issues: Emoticons are not standardized and there's still no legal precedence for compliance to include emoticons. You could create rules to flag odd character set behaviors just in case, but trying to keep up with ever changing emoticons would be a futile practice and one not required to comply with federal regulations –  Oct 28 '15 at 19:56
  • 1
    Really interesting idea of how easily to get around HIPPA/HITECH etc.! Don't over-achieve on compliance... but if you're interested perhaps just search for odd characters. e.g. instead of looking at every emoticon or image or slang word, just look for patterns that break the norm and investigate those individually. – Dave Oct 29 '15 at 21:36

1 Answers1

1

This is an interesting question which to be honest, I'd not considered. However, now that you raise it, I think you may be onto something. While I don't believe there is anything in any of the regulations which explicitly require you to consider this sort of information, I suspect this is simply due to the letter of the law sometimes being behind social evolution/development. In most cases, the intent of regulations is fairly clear and for some of these, the use of emoji as your examples indicate, would be considered to be unacceptable and it could be argued that all /reasonable/ attempts should be taken to monitor for such practices and appropriate action taken. Of course, /reasonable/ is somewhat debatable, but such debates are usually expensive and come down to the sort of legal arguments companies are reluctant to pursue unless forced to because of their unpredictability.

With respect to the question of whether you should or should not search for emoji in audits etc, I think this depends on your position and the level of executive support. If you are an external auditor, especially a government based auditor who checks for compliance with legislation etc, then you probably do need to consider searching for these /patterns/. If a company is using emoji to communicate information which would be considered illegal due to anti-discrimination etc, what they are doing is illegal and probably needs to be addressed. As an auditor, knowing this sort of practice occurs probably creates a moral duty if not a legal one. Unfortunately, resourcing, budgets and political pressures can come to play in such situations, so executive support/decision is probably required.

If on the other hand you are an internal auditor, then things may be a lot more complicated. Executive support would be even more important and a lot will likely depend on the philosophy or moral position of the company. For example, if you work for an insurance company and this information is actively used to assess premiums or if you work in a company which has an 'unofficial' policy against employing people based on their sexual orientation, drug history, criminal convictions etc and that company's HR department is using emoji to communicate such information in an attempt to avoid possible auditing issues, raising this issue will likely be career limiting and unwelcome. On the other hand, if such use of emoji is unofficial and not supported by the executive, then you may get a pat on the back for raising the issue so that the executive is aware of the risks/practices and can make an informed decision on what to do. Regardless of 'official' or 'unofficial' corporate position, this also has the potential of opening a can of worms, which is rarely welcomed by the executive, so there is also the danger of 'shooting the messenger' syndrome. Once the executive is made aware of something like this, it isn't easy for them to just ignore it and if it should lead to some sort of legal of regulatory issue, you have removed their 'we were unaware of the issue, but will now take action to address it' argument.

Then of course, there is your own personal moral position and it may be necessary to consider whether this is the sort of company you want to work for.

From a technical standpoint, I don't think you have much of an argument. Including the ability to search and analyse for the use of emoji may well be a PITA, but that is not sufficient reason to not do it. Lets face it, auditing in general is often a PITA and there are very few people I know who really welcome the growing amount of compliance we need to meet and monitor for.

Searching for emoticons is hard: The standard text delimiters (comma, bracket, etc) are interpreted by the on demand query engines as AND, NOT, OR, a phrase, or order of precedent

Yep, it will likely be harder to define the searches, but most search systems do have the ability to escape special characters or otherwise encode them to use them in searches. Hard is usually not sufficient justification for not doing something.

Parsing emoticons is hard: The backing store may mangle an individual emoticon into more than one character, or be ignored completely

Well, I'm not sure it is that hard. Lets face it, the interfaces which display these emoji are able to deal with them. Character based emoji are not too difficult to handle, although the lack of standardisation does mean you may need multiple, possibly per application, encoding sets. Image based emoji are much harder, though it is likely possible using various image matching algorithms, but this would be a lot of work. With respect to the backing store, this would be a limiting factor. If the store ignores or encodes emoji into a form you can process, then you can't do it - it is a limitation of the technology. However, if it just encodes them into a form which is more difficult to handle, this may not be sufficient - because it is hard is probably not a sufficient excuse. .

Storing emoticons is hard: Spurious Tuples result in invalid joins and queries

This one probably doesn't stand up. You don't have to store the actual emoji. Once you have identified it, you can encode it in any number of ways which work with whatever store you are using.

This is a good question and should generate some debate. There is probably some real opportunities for research in this area.

Tim X
  • 3,242
  • 13
  • 13