In order to fully appreciate the scope of the GDPR and your duties in relation to it, it’s essential to understand the personal data you are collecting and that you already hold. This is especially important for DPIAs, for the simple reason that a DPIA relies on having a comprehensive understanding of the data lifecycle.
There’s no explicit requirement for data mapping in the Regulation, but it would be extraordinarily difficult to meet all of the GDPR’s requirements without establishing the lifecycle of personal data in your organisation.
Data mapping is generally considered best practice for any data protection or privacy compliance programme, because you can’t protect your information if you don’t know:
a) that it exists,
b) where it is, and
c) the conditions under which it is kept.
Though simply put, data mapping can prove challenging for organisations that haven’t examined their processes before, work with a great deal of personal information, or rely on data held in a variety of formats. Even well-organised businesses rarely keep a centralised map of all their data collection and processing activities; it’s generally left as disparate pieces of information held in process documentation and contracts.
Regular data mapping exercises are essential to protecting personal data in line with the GDPR. It provides the organisation with a clear overview of its data processing activities, which can be leveraged for continual improvement across a number of the organisation’s business interests.
Objectives and outcomes
As with any new process, you need to identify the objectives and desired outcomes of data mapping before you begin. The overall objective of data mapping as part of GDPR compliance is to identify and address potential privacy issues.
The process of data mapping is not always as simple as just figuring out where the data is and what it’s used for; in many instances, the process includes analysis on the go. That is, while you work through the data flow, you take the time to identify the issues relevant to the data at each point. For instance, if the data passes through a storage phase, you might identify that the server where it resides isn’t behind a locked door.
The output of the data mapping should record key aspects of a data workflow that will inform the measures that you take to comply with the GDPR. Your primary interest in this activity will be personal data, which includes your employees’ personal data.
You’re also aiming to identify the specific risks to personal data, so your data mapping process should help you to identify unforeseen or unintended uses of the data. Because you generally need to inform data subjects about what you’re doing with their data, any additional uses are likely to be in breach of the Regulation.
It’s quite possible that the data mapping process can be rewarding for the organisation. In addition to identifying where efficiency can be improved, it can also draw your attention to potentially lucrative or useful processing opportunities.
Finally, and quite significantly, the data mapping process should help you to recognise who is involved at each stage in data processing activities and who should be involved. This will ensure that the people who will be using the information can be consulted on the practical implications of compliance with the Regulation (including the impact controls or other measures might have).
Four elements of data flow
There are four key elements in data mapping: data items, formats, transfer methods and locations. These elements are essentially all that you need to build your data map.
Data items are the information itself. An individual item could be a single data point (e.g. a name) or a collection of related data (e.g. all of the information the organisation holds about a data subject). You’ll typically define the data items on the basis of the process itself. If the process only uses a person’s address, for instance, then that would be the data item for that process.
Formats are the state in which the data items are stored. For an increasingly large proportion of organisations, this will be digital information stored in remote locations (e.g. in the Cloud), but you should aim to identify all of the formats that you actually use, including paper, photographs, backup tapes, USBs, etc.
Transfer methods are the explicit methods by which the data items move from one location to another, whether those locations and transport are physical or electronic. A process might include carrying physical files from a filing cabinet to a fax machine so that they can be faxed to another location. In this instance, both the act of carrying the files and faxing them are transfer methods, and a new data item (the printed fax at the other end) is created when the physical file is faxed.
Locations are the sites where data items are stored and where processing happens. Depending on the complexity of your processes, it may be useful to identify several locations at different levels of granularity for each step. For instance, you might specify that information is stored at the head office site, in the secure office and in the locked filing cabinet. This approach allows you to define varying levels of precision, which may be useful if some of your data items are spread across physical or digital locations.
Smaller organisations may prefer to simplify these elements as appropriate to their business. If you’re a business based in one office and you do all processing on site, for instance, you might declare that all locations are that office – there’s no real need to be more specific because the data can be easily located when needed. Any transfers that aren’t into or out of the organisation might also be ignored.
Data mapping, DPIAs and risk management
Data mapping is an important part of the risk management process. Your data map doesn’t need to have incredible granularity or detail to be accurate; the level of detail that you go into should be relevant. By identifying all of your collection and processing activities, you will gain an overview of the risks to personal data and a relatively easy way of identifying activities worthy of closer inspection as part of a DPIA or risk management process.
It should be quite obvious when you’re looking at a data map which areas could cause privacy issues. Examples might include when data is transferred to a third party, or when it interacts with several different individuals or entities, each of which could damage or inappropriately modify the data. These are risks, and should be processed and mitigated under your risk management methodology.
What you want to collect
You need a good understanding of your processing activities to develop a data map. In order to get to this state, there are a few questions you can ask about your process:
- How is the personal data collected? Personal data can be collected in a number of ways: paper forms, web applications, call centres, etc. Each of these methods also has a location – paper forms are often completed outside the business, for instance.
- Who is accountable for the personal data? Each processing activity should have an owner who is responsible for the data being processed. It’s possible you’ll also assign other people to be responsible for various elements of the processing, and they may have varying levels of responsibility for the data at different points.
- Where is the personal data stored? As described earlier, personal data can be stored in both the digital and analogue forms, and in several locations simultaneously, so it’s important to track all sites that store it.
- Who has access to the personal data? This may include employees who do not need access, which should therefore be reviewed. It may also include the data subject themselves, or their friends and family.
- Is the personal data disclosed or shared with anyone? This includes third parties such as suppliers and data processors.
- Does the system share data with other systems? Sharing data between systems can lead to unintended or excessive processing, but can also offer significant business benefits.
Methods of data mapping
One of your first considerations in approaching data mapping will be creating a schedule of data mapping exercises. Initially, you should run a data mapping exercise covering all of your processing activities to ensure that you’re complying with the Regulation. Following on from this, later data mapping exercises should coincide with your regular risk assessments. Beyond this, as with any other risk management process, you should map information flows when there is a significant change or a new processing activity has been designed.
You also need to identify the scope of the mapping. Like risk assessments, the first time you conduct a mapping exercise, you should be looking to map the lifecycle of all of the personal data your organisation holds. Depending on the size and complexity of this data, it may be valuable to break this down into phases, perhaps splitting the process up according to division or prioritising it based on the value and significance of each process.
With the context of the mapping pinned down, you can start the data mapping exercise. The method you use for tracking information flows will depend on the complexity of the organisation’s processes and preferences.
It is simplest to begin with creating a visual representation of the information flow. You can use almost anything that lets you draw or arrange information, such as a whiteboard, Post-it notes, software or other mind-mapping tools.
There are several ways to map information flows. You might prefer to focus on how data moves between specific sites. A very simple example is shown in Figure 10.
This method shows how Department A uses data, including its relationships with internal and external bodies, indications of data that have been processed, where data is stored within the department, and so on. For many organisations, this mapping method may be entirely adequate, or could form a high-level view of the range of processes within the organisation or department.
A process map, on the other hand, focuses on how data is used in the process itself. An example can be seen in Figure 11.
This is a simple example of what an online retailer might be doing with personal data, including payment card data. The retailer takes in the customer’s name, postal address and credit card details. The credit card details go to the segregated cardholder data environment (CDE), which is subject to a different process and specific, contractual, regulatory and legal requirements. Meanwhile, the customer’s order is separated from the postal address and both are sent to the fulfilment database. The database processes this information and sends the order to the warehouse, while the postal information is sent to be attached to the packaging. Customer address and order details are then reunited for release.
In a more complex version of this diagram, you might include an indication that the order and customer data are also sent to a separate process that tracks the customer’s purchasing habits, more specific details about the personal data collected (street address, post code, etc.), or you might get around this by referring to a spreadsheet that tracks the specific data and use the map as a quick reference or overview.
Whichever method you choose, you will need to be able to transfer the information in your data map over to your risk management process. In many cases, this means your data map needs to be converted to something tabular like an Excel spreadsheet.
If you are looking for further guidance on the data mapping process, the UK’s NHS has published a particularly useful and practical guide to information mapping123, which was written to support a focused information mapping tool. This document was developed to help healthcare organisations comply with a number of legal and ethical obligations relating to handling of personal and sensitive information. While it’s quite likely that your organisation isn’t subject to the same obligations, the process and fundamental concerns that the tool presents are fairly universal.
Vigilant Software, the IT Governance software development subsidiary, has a data flow mapping tool that allows this exercise to be simplified and supported over time. More information can be obtained from: