possibilitylab@berkeley.edu

In Data Management, Context is King

In Data Management, Context is King

By Research Associate Randy Clopton

I have long been a self-professed champion of the power of data, and I’m definitely not alone in this sentiment. The promises of big data are incredibly appealing. With enough information from a wide variety of data sources, you can feed practically anything into an algorithm and extract a myriad of insights. But while data boast big promises, the malicious or poorly considered use of data presents a conundrum just as big. Big data devoid of context and poorly managed not only presents opportunities to use data to come to incorrect conclusions, but also may lead to actively malicious conclusions, even if not intended. Effective data management and privacy practices are the guardrails that prevent this from happening, and these guardrails are the driving force behind much of my work at the Possibility Lab.

Throughout my professional life, I have always focused on how to ensure that the ever-increasing speed of technological development doesn’t leave underprivileged individuals in the dust. This interest has taken many forms over the years, from examining the economic reasons behind online media piracy to effective use of new technologies to aid English Language Learners in the classroom. As of late, my personal crusade has been data management and privacy. As a graduate student at Berkeley’s Goldman School, I spent a significant amount of time thinking about the best ways to embed a culture of data privacy in local government. Now, at the Possibility Lab, this has expanded into broader considerations for how large public data systems can produce a useful product without compromising citizens’ privacy.

If there’s one thing I’ve learned working with data, it’s that context is king. Data devoid of context may tell you something, but you’d be hard pressed to get anything truly useful out of it. Context here means more than just where data has been collected. It also depends on the questions being asked. “How might users be able to extract value from this particular data” has been the driving question in my work assisting governments in creating data systems. All other concepts fold within this question, which range from “What sorts of data may be useful to pair together?” to “How much data is actually needed to answer this user’s question?” This even extends to concepts that might seem like nitpicking, such as “Does this particular phrase in the data dictionary accurately describe the variable in question to the layperson?”

But data users may not always be asking benevolent questions. What about people who look to use data to do harm? This is where privacy really enters the conversation. As mentioned above, I spent a great deal of time thinking about privacy within local governments. Through this, I aided in developing a set of key considerations for how organizations need to think about data privacy. My full thoughts are included in a piece I published in the UC Berkeley Public Policy Journal, but here’s a quick summary.

Data privacy is a full-organization imperative. Not only should those collecting and managing data be aware of it, but every individual within a government system must bear some responsibility for keeping citizens’ data from harm. A culture of privacy needs to start from both the top and the bottom, from advocates at the executive level to practitioners within the rank and file.

In my view, there is one key takeaway here: your data is in these databases, both in the public sector and in the private sector. You deserve to know how it’s being used and protected, and you deserve to know how people like me think about keeping your data safe while we use it in our attempts to do some public good. So please, hold those of us who use your data accountable. You deserve to bear the fruits of your own information and to be kept safe from harm.

Our Work