The case for open data vs concerns for privacy

24 February 2022

Publicly available data can save us time and money - and even save lives - but we have to consider what we make available and how.

Over the course of a few weeks in the late summer of 1954, more than 600 people died from cholera in a small part of Soho, in London. This was just part of a wider epidemic that saw 23,000 people die from cholera in Britain that year, and the prevailing theory was that the disease was spread through the air.

One doctor, John Snow, had suspected for several years that the disease was spread in water. He plotted the Soho deaths on a map and concluded that the source of the outbreak was a single water pump, on Broad Street. He convinced parish authorities to remove the pump’s handle, taking it out of action, and the epidemic quickly came under control.

Snow’s theories were not immediately accepted and much more research was required before the causes of cholera were widely understood. However, his work is an important milestone in the development of epidemiology and public health management. It is also an excellent example of how data can change the way we think about a problem and improve people’s lives.

To collect the information he needed, John Snow, went door-to-door and then plotted his findings on a map. Today’s John Snows could investigate, say, the Covid-19 pandemic using a wealth of publicly available data to track, for example, how many people have been hospitalised in their local area, or where vaccine uptake is low. We now have publicly available data sources for many aspects of our lives, often provided by government as part of a commitment to ‘open data’. People are constantly finding innovative ways to use these sources. However, we must still give thought to what we share and how.

Faster and more efficient services

Not every use of open data needs to be a matter of life-and-death. Applying an internet-connected tracker to every bus and every bus stop makes the entire system part of the Internet of Things and allows people to see accurate, real-time information on when the next bus is coming. This is handy if you don’t want to wait in the cold, but it might also be the information an enterprising analyst needs to work out how to make bus routes more efficient or for an innovator to come up with a new app.

Improved efficiency could make it cheaper to run the bus service or make bus travel a more appealing option and therefore reduce car journeys, which would be good for the environment.

Likewise, being able to track parcel couriers on a map provides an easy way to predict when your delivery will arrive, while opening your banking data to trusted third parties lets you use apps that provide AI-driven, personalised financial advice. As the Internet of Things expands, so do the potential data sources. John Snow, for example, would have been amazed by the ease with which we can monitor air and water quality today.

Protecting the environment

Using the Internet of Things to monitor water pipes can help us to detect leaks more quickly, reducing potential damage from burst pipes and saving water. Similar sensors can be used to track the speed a river is flowing and give early warning of blockages or potential floods.

More data is being made available all the time, particularly from government sources. On the Isle of Man for example, the 2021 Census information has just been published, with further information to follow in May. This data plays an important role in allocating funding to services, setting policy, and so on, which is why Census participation is so important.

Making data available immediately reduces a barrier for someone who has a good idea for how that data might be used. They no longer have to find the person who holds the data, ask for access and then wait for it to be delivered. That lowers the cost of pursuing an idea and makes it more likely that people will experiment, plugging an open data source into a software model just to test a hunch and then moving on if it doesn’t work.

Expanding open data

However, organisations do not always supply data in the most usable way. You might not think there is a difference between publishing data in a PDF, say, and publishing the exact same data in a spreadsheet but the latter is much more useful, and for a programmer a live data feed allows them to start working with the data even more quickly.

Also, the thinking isn’t always joined up. One organisation might publish as much open data as possible, while another publishes the bare minimum. This might not be because they are opposed to open data; it just might not be a priority. It helps if there is a champion in place to drive open data efforts and maintain momentum.

Some organisations worry about releasing ‘bad news’ and would rather keep the data private than run the risk of bad publicity. In the long run, though, it is likely that the truth would come to light anyway. It is usually better to release the data and say that you are working on improvements.

Privacy is another concern. It might be useful to monitor vulnerable adults in their homes, perhaps to ensure that they don’t take a fall or are remembering their medicines. But releasing this data publicly would invade their privacy. It is vital that any data sets that contain personal information, such as health, education, or employment data, be properly stripped of identifying information before they are released.

We should, of course, be cautious and respect the people that open data is designed to help. But in an overwhelming number of cases, the benefits of open data vastly outweigh the risks. We should be looking to make the most of the opportunity to share as much data as possible in the pursuit of improving our lives and enabling innovation.