PostgreSQL RULES for pseudonymisation
Pseudonymisation; a use case
Data has value and, as stated in GDPR Unplugged, can be quite risky to handle.
Keeping it safe by itself is not a problem, but for data to be valuable, it needs to be accessed. The process of accessing it introduces the risk: data may leak. Leaking riskful data is bad for continuity in many ways.
One solution to make riskful data harmless, is pseudonymisation. This word is invented by the EU and mentioned in the GDPR document.
What they mean by this is a 1-to-1 function (a bijection in mathematical terms) where going one way is easy, but going back is terribly hard.1 The terribly then is documented by the EU to need additional information for attributing the personal data to a specific data subject which is to be kept separately (see pretext item (29)).
So going back needs some extra information only authorised people have access to.
In Anonymise and tag simultaneously I describe how this could work. In this title I will show exactly how. The performance impact and the way to get back from a scrambled field will be dealt with in subsequent titles.
Riskful data displayed in a harmless way
The riskfulness in employee data are names, date-of-births, acountnames, e-mailaddresses and roles. Let's assume that the name of someone's department is not a risk.
Making the employees table harmless, I need to provide for a RULE which pseudonymises these columns based on the access-level of the current user.
To make things simple, I assume that for personnel data, HR and no others have full access to the riskful data (except the director of the board of course).
Before being able to present a new table for employees, I need to move the old one out of the way. It is called employeesold for now.
A rule then could be:
CREATE OR REPLACE RULE "_RETURN" AS ON SELECT TO employees DO INSTEAD SELECT id, pn, pseudonymiseifarisk( name ) as name, pseudonymiseifarisk( prefix ) as prefix, pseudonymiseifarisk( firstname ) as firstname, NULL::date as dateofbirth, sex, department, primaryrole, ssn, pseudonymiseifarisk( accountname ) as accountname, pseudonymiseifarisk( email ) as email FROM employeesold;
Where the function pseudonymiseifarisk returns the column untouched in case the current user is not a risk and the pseudonymised one in case the current user is a risk.2
The predicate function to decide this is called isarisk and contains all business logic needed:
CREATE OR REPLACE FUNCTION isarisk( ) RETURNS boolean as $$ SELECT count(*) = 0 FROM employeesold e JOIN departments d ON d.id = e.department JOIN jobroles j ON j.id = e.primaryrole WHERE pseudonymise( accountname ) = pseudonymise( CURRENT_USER ) AND (d.name = 'human resources' OR (d.name = 'board' AND j.name = 'director' )) $$ LANGUAGE 'sql' STABLE;
Now let's see whether this gives us what we need:
set role jjansen; select id, name from employees;
SET | |
---|---|
id | name |
1 | 1e3de977b838c5361b45f5da310103df |
2 | 54de3e9b7d1ceead9523ab02b302306f |
3 | 498c57aa172b093bf9984aa8a4bdd33b |
4 | 1548cb8abd1a0fbce372a7da4c1acc94 |
5 | 8e4b6fe638e62db9c4efae442b80b1b2 |
6 | b7af03c1e63f54a05377b0cbb3ddfae5 |
set role acompy; select id, name from employees;
SET | |
---|---|
id | name |
1 | Harfort |
2 | Orquort |
3 | Compy |
4 | Jansen |
5 | Jooks |
6 | Müller |
This is only part of the work, for INSERT and UPDATE, rules must be created as well. This is to avoid pseudonymised data ends up being inserted in the database. These rules are pretty straightforward, ignoring all riskful columns.
As this is a lot of work and error-prone, my proposal is to change column-level security to cater for this in a single statement.
The function for pseudonymise
DROP FUNCTION pseudonymise( text ); CREATE FUNCTION pseudonymise( text ) RETURNS text as $$ select case when length( $1 ) = 32 then $1 else md5( $1||to_char(CURRENT_DATE,'yyyymmdd')||CURRENT_USER ) end $$ LANGUAGE 'sql';
Note that I make this function idempotent. This is crucial as this particular table contains information about the authorization level and has harmful data at the same time.
DROP FUNCTION pseudonymiseifarisk( text ); CREATE FUNCTION pseudonymiseifarisk( text ) RETURNS varchar as $$ select case when isarisk() then pseudonymise( $1 ) else $1 end $$ LANGUAGE 'sql';
Used datamodel and content
DROP TABLE employees; DROP TABLE employeesold; DROP TABLE departments; DROP TABLE jobroles; CREATE TABLE departments ( id serial PRIMARY KEY, name varchar(32) UNIQUE ); CREATE TABLE jobroles ( id serial PRIMARY KEY, name varchar(32) UNIQUE ); CREATE TABLE employees ( id serial, pn integer, name varchar(24), prefix varchar(6), firstname varchar(24), dateofbirth date, sex char, department integer, primaryrole integer, ssn varchar(14), accountname varchar(14), email varchar(40) ); CREATE TABLE employeesold ( id serial PRIMARY KEY, pn integer, name varchar(24), prefix varchar(6), firstname varchar(24), dateofbirth date, sex char, department integer NOT NULL REFERENCES departments( id ), primaryrole integer NOT NULL REFERENCES jobroles( id ), ssn varchar(14), accountname varchar(14) UNIQUE, email varchar(40) );
Filling the tables is easy; the dataset for employees is small for now.
DELETE FROM employees; DELETE FROM employeesold; DELETE FROM jobroles; DELETE FROM departments; INSERT INTO departments (name) VALUES ('finance'), ('human resources'), ('operations'), ('board'), ('sales'); INSERT INTO jobroles ( name ) VALUES ('account manager'), ('systems integrator'), ('controller'), ('manager'), ('director'); INSERT INTO employeesold ( pn, name, prefix, firstname, dateofbirth, sex, department, primaryrole, ssn, accountname, email ) VALUES ( 6, 'Harfort', null, 'Jane', '2001-01-02', 'f', 1, 4, '', 'jharfort', 'jane.harfort@thecomp.eu' ), ( 8, 'Orquort', 'd''', 'Jeff', '1997-10-02', 'm', 2, 4, '', 'jdorquort', 'jeff.d.orquort@thecomp.eu' ), ( 1, 'Compy', null, 'Anne', '1999-10-12', 'f', 4, 5, '', 'acompy', 'anne@thecomp.eu' ), (14, 'Jansen', null, 'Jake', '2000-4-5', 'm', 1, 3, '', 'jjansen', 'jake.jansen@thecomp.eu' ), (15, 'Jooks', null, 'Peter', '1999-04-05', 'm', 3, 4, '', 'pjooks', 'peter.jooks@thecomp.eu' ), (16, 'Müller', null, 'Elke', '2001-07-05', 'f', 3, 2, '', 'emueller', 'elke.mueller@thecomp.eu' );
SELECT e.id, e.pn, e.firstname||' '||coalesce(e.prefix, '')||e.name as fullname, e.dateofbirth, e.sex, d.name AS dep, j.name AS primaryrole, e.ssn, e.accountname, e.email FROM employeesold e JOIN departments d ON d.id = e.department JOIN jobroles j ON j.id = e.primaryrole
id | pn | fullname | dateofbirth | sex | dep | primaryrole | ssn | accountname | |
---|---|---|---|---|---|---|---|---|---|
1 | 6 | Jane Harfort | 2001-01-02 | f | finance | manager | jharfort | jane.harfort@thecomp.eu | |
2 | 8 | Jeff d'Orquort | 1997-10-02 | m | human resources | manager | jdorquort | jeff.d.orquort@thecomp.eu | |
3 | 1 | Anne Compy | 1999-10-12 | f | board | director | acompy | anne@thecomp.eu | |
4 | 14 | Jake Jansen | 2000-04-05 | m | finance | controller | jjansen | jake.jansen@thecomp.eu | |
5 | 15 | Peter Jooks | 1999-04-05 | m | operations | manager | pjooks | peter.jooks@thecomp.eu | |
6 | 16 | Elke Müller | 2001-07-05 | f | operations | systems integrator | emueller | elke.mueller@thecomp.eu |
about this title
The document to generate and run the scripts has the same source as the document you are reading now.
Most scripts are bare bone, the amount of fancy stuff is kept to an absolute minimum in order to present only the concepts at hand and only that.
This title was written between 26th and 28th of September 2017