Every agency generates it, and it is best used when it is shared, combined for new insights and applications. But data sharing is not as simple as it sounds. New research from the industry-supported Center for Data Innovation outlines six possible strategies for sharing your data. For more, Federal Drive with Tom Temin spoke with Daniel Castro, the Director of the Center for Data Innovation.
Tom Temin All right. Well, tell us about this research, because data sharing is a term that’s used glibly and frequently. Let’s share data. We need better data sharing. If we knew this, if we shared this data, we’d have this application. But tell us about some of the subtleties you have outlined in a new white paper about data sharing.
Daniel Castro Well, what we tried to do here is really outline how as much as we talk about the importance of data in the United States is still so far behind in terms of actually creating this, you know, rich data ecosystem where the right data can get to the right person at the right time. And there’s a lot of reasons for that. And some of it is kind of societal and social, and some of it’s really technical and economic. And so in this paper that we put out, we really try and go through and look at where the barriers are right now today in terms of why we aren’t sharing data as much as we could be or probably should be to get use of all of this information that’s out there. And then what we can do to start addressing those problems.
Tom Temin Often federal agencies state that it is statutory prohibitions on this agency sharing data with that agency. That’s the problem. But if the law doesn’t say you can’t, then it strikes me that, well, you can.
Daniel Castro Well, that’s one of the issues that comes up again and again. I mean, there are some specific laws that say you can’t share certain data and that becomes an issue. There’s also just a reluctance on the part of many government agencies to share data in many cases or to sometimes collect data that they might otherwise be able to collect from either the private sector or industry. And this isn’t always personal information that we’re talking about. Sometimes this is sensor data, it’s business data, it’s other data that’s out there that has just enormous positive social value. And we need to have somebody creating the technical infrastructure and the economic incentives to allow this data sharing to happen.
Tom Temin All right. And you also have outlined like six basic regimes for data sharing or six strategies, I guess you’d call them. Maybe briefly, what are they?
Daniel Castro Sure. So the first one is fixing these data protection laws. We have to reduce these legal barriers. So just get that out of the way. Most of the laws were written at a time when there wasn’t a lot of data collection going on. You know, if you go back to 1974, the Privacy Act for the federal government, you know, these laws were intended for kind of small data where there was a little bit of data collected and we knew we didn’t want to share all that data back out. That’s changed. Now, many agencies are collecting data. They should be collecting more data, and it’s often good that they can share data across agency boundaries. And that’s right now where there’s a lot of barriers in Europe and other places, they had this idea of collect once where you don’t have to have the individual or business imitated multiple times. So that’s one area. Another issue is trying to figure out if we can create some model data sharing contracts in these different sectors like health care or financial services, where we want to have the government encourage more data sharing. They can pave the way by basically handing a legal document out to private sector and say, if you want to share data under these existing laws, here’s how you can do it. So just lowering those barriers. There’s also you think about from the consumer side. Every year I get a notice from my bank or credit card saying this is how we protect your privacy. And if you don’t want to share data, here’s how you can opt out. You get lots of those notices. You get those at the doctor’s office. You never get the opposite. You know, they’re saying, hey, if you want to share data for research, for other kind of beneficial purposes, here’s a reminder of how you can share data. If you want to donate your data, here’s how you can do that. So we need to kind of flip a mental model that how we’re approaching this. And it’s not that everyone has to share their data, it’s that it can be voluntary and it can go either way. Some people don’t want to share the data. Some people do. Let’s give them the choice and let’s empower them. So there’s a number of areas like that. And, you know, I’ll just mention also, you know, data standards in particularly high impact areas. This is something that’s really hard to do. I mean, it’s an area where government can do a lot to really streamline how data sharing occurs, because so much of this is just being put on the private sector. But we’re ultimately talking about a public interest, and that’s where the government can help make sure we’re investing to its potential.
Tom Temin We’re speaking with Daniel Castro, director of the Center for Data Innovation, part of the Information Technology and Innovation Foundation. And could it be, though, that there’s maybe a technical problem here? And that is very often people want to share agencies want to share data, but they don’t want the personally identifiable part of it to be shared. And there are methodologies and technologies that can anonymize data. Could it be that that’s just not simply a widely enough adopted discipline and therefore it gets in the way of sharing the rest of the data, which might. All you really need.
Daniel Castro That’s right. There’s a lot of technical solutions, whether it’s anonymizing data or doing federated learning, where instead of collecting all the data one central place and doing the analytics, you do the analytics where the data stored and then just bring in the results so that you never have to give up that personal information or have it leave a device or an agency. There’s a number of technical solutions, and a lot of federal agencies don’t have those expertise in-house, or there’s just reluctance to do it because if they do it, they’re kind of taking on the risk. But if we do it incorrectly and so, you know, there’s this kind of reluctance to do anything in that space. And that’s where we’re saying, you know, we need to keep pushing and we need to make it clear that this is the expectation that agencies share data and address these problems. And you can’t just kind of hide behind these outdated rules and regulations.
Tom Temin And what about the question of data literacy? Because very often to get a result you want you need a data expert to tell you, well, this is the data you actually need, whereas someone that just knows what their outcome is desired may not really understand the data implications of getting to that outcome.
Daniel Castro That’s right. I think it’s underappreciated how important data is to solving different problems, whether it’s and, you know, something like education, where there are so many different initiatives that are out there. And, you know, a simple question is, are these initiatives effective? You know, you need to know if you do an intervention in preschool or an intervention to provide free lunches in middle school, you know, what does that mean in terms of outcomes ten years down the road? That takes an enormous amount of data collection because you have to link somebody’s receiving this intervention in middle school to ten years down the road when they’re in a workforce database. Or maybe you’re looking at the impact it has on prison and the judicial system. There’s so many interactions here and there’s so much that we should be gleaning from all these different programs to see what’s effective and what’s not. Let’s have a well-functioning government and you need data to do all that. And that takes data literacy, that takes skills. And, you know, data literacy is on both sides of that equation, right? Estella’s and the government so that they’re thinking about how to use this. It’s also data literacy just in our communities so that we understand why data is being collected about us and how it’s coming back to help us in the end.
Tom Temin And the result of all the barriers and the alphabetically named prohibitions like HIPAA. I mean, HIPAA gets spread on things like peanut butter, and that’s probably not nearly as dangerous a prohibition as it’s made out to be. But that’s where we are. What that’s all led to is agencies simply reluctant to share their data because it’s mine and not yours. Is that also part of the issue? And I think you addressed that in the paper.
Daniel Castro That is I mean, so many of these laws are out there and there’s not good guidance on even when a law like HIPAA, which allows data sharing, it’s still mostly seen as a law that doesn’t enable data sharing. And that’s what the average person knows. That’s what the average person in government thinks as well. And it’s almost like a game of telephone where, you know, the lawmakers originally wrote the law saying you could do some data sharing, but, you know, there’s all these restrictions and then people just kept repeating the restrictions. And now the end result is you have health data. Your main concern is how do I not get fined for violating HIPAA? And that’s the model. And that’s where, you know, we have to really think about what’s the end goal. It’s not just protecting patients’ privacy, it’s also protecting patients, giving them better health care. And the only way to do that is by figuring out which interventions are effective and how we can, you know, recall dangerous drugs quickly and do all these things that you do with data.