8

After some problems with UTF8 and none-UTF8 strings, we're standardising on UTF8. One thing I need to do is check that everything is in UTF8 in the MySQL database? What do I need to check?

  • Server default characterset
  • Default character set of each database
  • Does each text column have a character set? How do I check for that?

I was thinking of putting all this in a nagios plugin to check that everything is in UTF8. Suggestions?

Amandasaurus
  • 30,211
  • 62
  • 184
  • 246

5 Answers5

9

Rory, First of all, you are correct for wanting to monitor what gets created in you databases. While we all implement steps to prevent mistakes, you cannot assume that the mistakes won't creep in. I do a very similar thing as most of our infrastructure demands UTF8.

The following queries are good for checking stats:

SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME, COUNT(0) AS COUNT FROM information_schema.SCHEMATA GROUP BY DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME;
# to filter schema use SCHEMA_NAME in the where clause
SELECT TABLE_COLLATION, COUNT(0) AS COUNT FROM information_schema.TABLES WHERE TABLE_COLLATION IS NOT NULL GROUP BY TABLE_COLLATION;
# to filter schema use TABLE_SCHEMA in the where clause
SELECT CHARACTER_SET_NAME, COLLATION_NAME, COUNT(0) AS COUNT FROM information_schema.COLUMNS WHERE CHARACTER_SET_NAME IS NOT NULL AND COLLATION_NAME IS NOT NULL GROUP BY CHARACTER_SET_NAME, COLLATION_NAME;
# to filter schema use TABLE_SCHEMA in the where clause

The following queries would be good from a cron job that emails you if any results are found:

SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA WHERE DEFAULT_CHARACTER_SET_NAME NOT LIKE '%utf8%' OR DEFAULT_COLLATION_NAME NOT LIKE '%utf8%';
# to filter schema use SCHEMA_NAME in the where clause
SELECT TABLE_SCHEMA, TABLE_NAME, TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_COLLATION IS NOT NULL AND TABLE_COLLATION NOT LIKE '%utf8%';
# to filter schema use TABLE_SCHEMA in the where clause
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, DATA_TYPE, CHARACTER_SET_NAME, COLLATION_NAME FROM information_schema.COLUMNS WHERE CHARACTER_SET_NAME NOT LIKE '%utf8%' OR COLLATION_NAME NOT LIKE '%utf8%';
# to filter schema use TABLE_SCHEMA in the where clause

These queries work for MySQL >= 5.0. If you need help writing the cron job, let me know.

I also have a problem with my developers (mostly the Ruby on Rails team) creating InnoDB tables when they don't need them. To keep this under wraps I monitor the staging database with a cron that calls:

SELECT TABLE_SCHEMA, TABLE_NAME, ENGINE FROM information_schema.TABLES WHERE ENGINE = 'InnoDB';
Bruno Bronosky
  • 4,429
  • 3
  • 24
  • 32
1

I'd do something like mysqldump --all-databases --no-data | grep -i character

Jure1873
  • 3,692
  • 1
  • 21
  • 28
  • That's a bit raw. I have no way of knowing what database / table / column definition isn't utf8. – Amandasaurus Jun 26 '09 at 15:55
  • Yeah, you'd need to use something like awk instead of grep. Because I cannot get legible formatting in a comment, I'll post it in another answer. – Bruno Bronosky Jul 09 '09 at 15:36
1

Jure1873 suggests using grep, and I commented that awk was more appropriate. Here is what I threw together.

mysqldump --all-databases --no-data | awk '{gsub("[`;]","");};/^USE /{db=$2;};/CREATE TABLE /{table=$3;};/^\) ENGINE/{sub(".*CHARSET=","");if(!match($1,"utf8")){print db ":" table ":" $1;}}'

Here it is with increased readability:

mysqldump --all-databases --no-data | \
awk '
    {
        gsub("[`;]","");
    };
    /^USE /{
        db=$2;
    }; \
    /CREATE TABLE /{
        table=$3;
    }; \
    /^\) ENGINE/{
        sub(".*CHARSET=","");
        if(!match($1,"utf8")){
            print db ":" table ":" $1; 
        }   
    }   
'
Bruno Bronosky
  • 4,429
  • 3
  • 24
  • 32
0

Nagios is really the wrong sort of tool for this. Nagios is more for service monitoring, not checking configuration is wrong. A better tool would be some sort of configuration management tool like puppet or cfengine to make sure that the configuration files have the right settings.

As far as actually configuration, the MySQL manual has a very comprehensive chapter on character sets and encoding.

David Pashley
  • 23,151
  • 2
  • 41
  • 71
  • Well I'd like to get emailed when some developer makes something that's not UTF8. – Amandasaurus Jun 26 '09 at 15:54
  • 2
    Don't let developers change production systems. – David Pashley Jun 26 '09 at 16:25
  • In which case write a commit hook on your SCM which detects when tables aren't utf8. Or wait until the developer creates a software release, install it on a non-production (staging/test/validation) system, then check it. Seriously, just introduce a process in the software release system where this is checked before a release is done. – MarkR Jun 29 '09 at 05:49
0
dbs=$(echo 'show databases;' | mysql | grep -v '^Database$' | tr "\n" " ")

for d in ${dbs}; do
 tables=$(echo 'show tables' | mysql $d | grep -v '^Tables' | tr "\n" " ")
 for t in ${tables}; do
  echo "**************** DB ${d} TABLE ${t}"
  mysqldump --no-data ${d} ${t} | grep  COLLATE | grep -v utf && echo "DB ${d} TABLE ${t} !!!!"
  mysqldump --no-data ${d} ${t} | grep  CHARSET | grep -v utf && echo "DB ${d} TABLE ${t} !!!!"
 done
done
Jure1873
  • 3,692
  • 1
  • 21
  • 28
  • While I have great respect for bash-fu, these are inappropriate uses. You are obviously skilled and would benefit from getting familiar with http://dev.mysql.com/doc/refman/5.0/en/information-schema.html and the uber-useful http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat The GROUP_CONCAT is very useful for generating valid SQL with SQL which is then piped to a secondary mysql call. I demonstrate this here: http://serverfault.com/questions/8860/how-can-i-export-the-privileges-from-mysql-and-then-import-to-a-new-server/13050#13050 – Bruno Bronosky Jun 29 '09 at 16:23
  • I know that solving that problem in mysql is more appropriate I just wrote the first thing that came to my mind – Jure1873 Jun 29 '09 at 21:20