Data Ethics: New Frontiers in Data Governance

Would you feel comfortable serving as a Data Governance consultant for an organized crime family … but not for a brokerage with tax fraud in its past? Could the use of ransomware be considered socially acceptable if its demands benefitted needy children? These hypotheticals might sound outrageous – and indeed, they are meant to be provocative – but they serve to illustrate the new and largely uncharted territory of data ethics within today’s high-tech corporate culture.

Data evangelist Karen Lopez and academic/data consultant Peter Aiken shared these and other thought experiments during their presentation at DATAVERSITY’s Data Governance & Information Quality Conference. Lopez, who serves as a data consultant for InfoAdvisors, and Aiken, a Ph.D., associate professor, and former advisor for DAMA and several government agencies, are both passionate about updating the domain of data ethics for the 21st century.

As Lopez and Aiken see it, the new realities of multi-platform, data-driven business continue to create new gray areas and blind spots for ethical behavior, codes of conduct, and best practices, and these challenges are only compounded by the lack of ethics training within data science – and business culture in general.

“We haven’t taught businesspeople much about ethics for the past 30 years,” lamented Aiken, “So we need to be educating upwards as well as down.”

Data Ethics 101: Philosophy in Cyberspace

While Lopez and Aiken are wary of considering data ethics from a purely theoretical framework – arguing against “how many angels can dance on the head of a pin type stuff,” Aiken quipped – they stressed that the elementary ethics concepts you might learn in a philosophy class are still worth clarifying. Lopez pointed out that many professionals she encounters conflate morality, legality, and ethics indiscriminately, though these three concepts are distinct. While morals concern subjective notions of good and bad, and laws concern the limits of what is socially acceptable, Aiken and Lopez define ethics as “the difference between what you have the right to do and what is the right thing to do.”

Navigating that crucial difference is rarely cut and dried even in simple, day-to-day personal interactions. Still, within the world of data, ethical questions can quickly take on multiple dimensions and present challenges unique to the field. Assessing data ethics can be decidedly confusing, for as Lopez pointed out, “Not all things that are bad for data are actually bad for the world … and vice versa.”

Whereas the ethical actions and judgments that we make as private individuals tend to play out within a limited set of factors, the implications of even the most innocuous events within large-scale Data Management can be huge. Company data exists in “space,” potentially flowing between departments and projects, but privacy agreements and other safeguards that apply for some purposes may not apply for others. Data from spreadsheets authored for in-house analytics, for example, might violate a client privacy agreement if it migrates to open cloud storage.

Even a small modification in hardware such as “adding a patch cord from one server to another could be violating data ethics because it’s violating privacy laws,” noted Aiken. “Or it’s confidential data that should never be matched up to a customer data field, even if it’s a patient piece of data.”

Similarly, just as data must be consistently governed across the entire business, data may also have a life that extends beyond the life of products and projects – as well as across generations of employees and managers – causing knowledge workers to potentially make an ethical violation without even realizing it. A GPS tracking app employed by a national fast-food chain, for example, may still be employed years after being installed to monitor clients for purposes they had never consented to.

While Aiken and Lopez never overlook intentional criminal data ethics violations, they assert that the more insidious threat comes from employees who are personally ignorant of crimes they may nevertheless be legally accountable for individually. Lopez sees this pitfall as illustrative of the need for revised data ethics protocols among Data Governance and Data Stewardship managers.

Lopez recounted, for instance, an incident in which a company she advised was found to be negligent in the transfer of vital personal information of thousands of clients to an unsecured server – by an IT team that believed it was carrying out a standard practice. “It became unethical because we didn’t give the frontline workers any tools to remind them that they were about to break the law,” concluded Lopez. “And that didn’t happen at a big credit card company; it happens everywhere.”

Automated Misconduct: Ethical Concerns in New Technology

In addition to surveying how Data Governance culture and conduct needs to be revised at the level of practices – “people problems” – Aiken and Lopez sketched several salient trends in which ethical gray areas are essentially baked into the latest technology itself. Consider the as-of-yet unchallenged predictions of the so-called Moore’s Law, the assessment that microchip technology continues to decrease in size over a shorter time, at lower costs.

This ongoing revolution in expanded computer power, coupled with cheaper and cheaper storage costs, is a blessing to businesses. Still, the migration of more and more corporate critical functions to computers – and an unprecedented consolidation of personal information in centralized databases – presents a greater opportunity for identity theft, fraud, and cybercrimes that data scientists must vigilantly track.

Recent strides in the processing and transferring of data also bring with them novel ways of abusing private information. The evolution of high-speed networking and other online peer-to-peer innovations allow for remote sharing and copying, opening the door for unwanted or even accidental data transfer, in addition to large-scale cyber heists.

Perhaps more insidious are the latest developments in personal profiling, some of which can weave seemingly harmless data points from multiple sources to assemble alarmingly intrusive dossiers on almost anyone. Take non-obvious relationship awareness, or NORA, technologies, which can track 95% of Americans using only three pieces of ostensibly public data (gender, birthdate, and birth zip code). “That’s more than a little scary,” Aiken remarked, as NORA is a definitive example of how violable our privacy has become.

While all these issues are generally more the domain of law enforcement, tracking new developments in how data can be compromised should be crucial in building the safest ways for it to be managed.

The Way Ahead

Aiken and Lopez are united in their belief that today’s corporate data culture needs a radical overhaul of its ethical practices and institutions.

“We both agree that knowledge workers have to be trained in the area of data ethics,” said Aiken. “Not just IT workers, not just managers, but everybody who touches data in the organization.”

While broad ethics training has for too long been neglected in most business programs, an even bigger challenge is mapping how and where today’s data science presents challenges, dilemmas, and liabilities that can be invisible.

In simple terms, the two speakers see communication as the biggest data ethics challenge for Data Governance leaders, who are tasked with creating shared definitions and clarifying roles within corporate culture. At a legal level, data science can sometimes be a Wild West of open-ended questions: Do property rights always apply to data? In our globalized economy, how can one track such rights across borders?

Within corporate structures and team dynamics, ethical ambiguities – and snafus – often come down to misunderstandings in the roles of governance and operations. Strong Data Governance, for Lopez, depends on ensuring that all employees connected to the life of data know their responsibilities and secure appropriate protection in every changing context.

Finally, data managers monitor operations with the understanding that in the domain of data, conduct violations may not come with a smoking gun – or even appear to be violations. As Lopez affirmed, “Most ethics decisions are not that clear at all, right? They’re more subtle because people are complex.”