0

So my solution to anonymize data on our backup server where we store backups of MySQL databases in .sql files is the following:

Created a script that copies the .sql file from the backup server to my local workstation, automatically imports it into my database server, straight anonymizes the sensitive data, exports the database into another .sql file, removes the original .sql and also drops the database. Thus I end up with an anonymized .sql file. However, I wish to make it faster by directly re-writing the values in the original .sql file it self.

What do you think, what is the easiest method for this or is my solution already the best there is? Was thinking about using the sed command someway, but I just have no clue ATM how to tell the script what to replace, where, how many times and with what. :/

Bert
  • 984
  • 1
  • 11
  • 29
  • 1 sql script to anonymize and another sql script to restore. What to replace (like names, e-mails, phone numbers, etc) is strictly your problem to determine, noone else can help you with that. – Overmind Jul 18 '19 at 12:47
  • Skip the duct tape and bailing wire solutions, can almost guarantee that a search/replace system will break stuff. Store your backups encrypted, either individual file level or in a encrypted volume like truecrypt or equivalent. – ivanivan Jul 18 '19 at 13:01
  • The backup is encrypted, that is checked. Also, after anonymization, the data is unrestorable. That is the main goal. I just don't know how to search for the specific values I need to replace. I was thinking, you Lads have a hint or two. :) – Bert Jul 18 '19 at 13:24

3 Answers3

0

With Mysql ist easy to export and import data as csv.

So it would be possible to manipulate the data maybe with simple shell commands in a Loop. Much easier than to parse a sql.

Just as simple example to work on:

for row in $(cat test.csv);
  col1 = (echo $row | cut -d ',' -f 1);
  col2 = (echo $row | cut -d ',' -f 2);
  col3 = "Value to change"
echo "$col1,$col2,$col3" >> outpu.csv
done
0

Things to consider:

  • How many times do you "hit the disk". Writing a dump file or a .csv costs something because of the disk hits. Sometimes a workaround is to "pipe" steps together.
  • UPDATE in place goes through all the transactional stuff; this leads to extra hits. So, perhaps UPDATEing is not the best approach.
  • CREATE TABLE and DROP TABLE take some effort, but not nearly as much as reading or writing an entire table.

I would consider the following (if it works for your situation):

CREATE DATABASE AnonDB ...;
USE AnonDB;
CREATE TABLE t1 LIKE RealDB.t1;
INSERT INTO t1
    SELECT id, ok_data, Anonymizer(sensitive_data), ...
        FROM RealDB.t1;
... t2 ...
etc.

Notes...

  • Need DROP RealDB at the end?
  • You need the function(s) Anonymizer.
  • Everything is manual (which table to convert; which columns; etc).
  • No CSV or sed.
Rick James
  • 2,058
  • 5
  • 11
0

At the end, all your answers were good, but did not increase the performance and reduce the time that is needed to serve an anonymous database for the developers. Here is my bash script first:

#!/bin/bash

if [ -z "$1" ]; then
    echo ""
    echo "No anonymization PHP file present."
    echo "Please create first a project specific anonymization PHP file!"
    echo ""
    exit
fi

if [ -z "$2" ]; then
    echo ""
    echo "You have not specified where the backup SQL files are located"
    echo "Example: /backup/daily-raid/websites/projectnumberone/sqldump/"
    echo ""
    exit
fi

phpconfig=$1
route=$2

file=$(ssh backup "ls -t $route | head -1 && exit")
dbname=$(echo $file | head -c -11)
dbslug=$(echo $dbname | sed -e "s/-/_/g")
createdb="create database $dbslug"
dropdb="drop table $dbslug"


scp backup:$route$file /home/sysadmin/dumps

mysql -u root -e "$dropdb"
mysql -u root -e "$createdb"
mysql -u root $dbslug < /home/sysadmin/dumps/$file

rm -rf /home/sysadmin/dumps/$file

php $phpconfig $dbslug

exit

Please ignore the fact that the local DB servers root has no password.

The PHP file is based on these GitHub projects:

Just create a php file that knows what data needs to be anonymized and the bash script will run it. OFC you need PHP to be installed on your local workstation and MySQL server.

Bert
  • 984
  • 1
  • 11
  • 29