Agencies across the government, including in the White House, are planning for, procuring, building, or attempting to scale impact and improve performance through integrating data and using data analytics. I am an advocate for the proposition that government reform starts with data and evidence. However, large investments that underpin even larger transformation ambitions are at risk of failure without primary focus on the human, organizational, and cultural barriers to success.
Primacy of federated data governance
The Foundations of Evidence-based Policymaking Act established a network of chief data officers across agencies. These individuals are responsible for much of the successful deployment of agency operational decision support and policy analysis systems across the federal government, work done under the Federal Data Strategy. As the inaugural CDO at the Department of Veterans Affairs I led deployment of a common operating picture into initial operations spanning over a thousand datasets, approximately 18,000 transformations, and about 5,000 data pipelines. After my departure, the VA COP was used to expand veteran access to services through personalized outreach and save $90million in six months through better acquisition.
This success was enabled by many with the twin pillars of advancing enterprise data management and literacy via data governance aligned under the department’s data strategy, and use of an integrated software as a service technical solution. The two were symbiotic. The critical path for enterprise impact was through maturing mostly federated data governance. I also heard this dynamic from my peers across government as the chair of the Large Agency Committee of the Federal CDO Council.
Planning for integrated data systems must include requirements for interoperable data management, analytic model management and automation of federated governance and data management workflows. These requirements would come from establishing, iterating and maturing federated data governance. Further, the scope of data governance must cause synchronization between information resource management policies and processes anchored with agency CDOs, and those anchored with chief information officers, chief technology officers and chief artificial intelligence officers.
Federated governance and semantic interoperability
A useful federated governance model that has stood the test of time is that used by the NIEMOpen, the leading public sector semantic interoperability framework. NIEMOpen usefully sets common minimum standards for public sector domains that align with federal diversity, and has existing federal sponsorship over many of its domains.
NIEM also offers alignment on open standards to improve data readiness for AI-based transformation. NIEMOpen 6.0, combined with entity resolution technology, supports faster cross boundary data integration for creating open standards-based ontologies and knowledge graphs that are ready for techniques such as GraphRAG.
Trusted policy enforcement and data stewardship
Modern technical solutions offer strong capability for “come as you are” data integration and curation. But technical capabilities don’t eliminate the need for integration of data stewards into business processes that enable required data integration. Indeed, overseeing data management activities that identify and address data quality and metadata management is a key data governance function. Improving data quality is always and everywhere a central consideration with data analytics efforts because of the focus on secondary use beyond transactional workflows. Data stewards are key to ensuring use — and reuse — of data is policy compliant, aligned with use and reuse ready.
The best way to integrate data stewards into required federated data management is to lead with data governance as a first rank consideration to enabling solution delivery. The tone from the top and accompanying guidance must be set in a way that encourages the chain of command to task key individuals to participate, and for them to see and buy into the purpose. This only works if the chain of command and data stewards believe that their legitimate concerns will be authentically considered, and they have some ownership over or appreciation of targeted analytic requirements.
Shaping and strengthening key analytic questions to be addressed
Many data analytics practitioners have evolved the concept of learning agendas and key analytic questions to be central in addressing enterprise policy analysis and operational decision support requirements. KAQs articulate the targeted insight and how it might be used to improve agency operations, and what agency datasets and analytical models — including AI techniques — are targeted to be used to produce this result. Refining KAQs is an iterative process innovating based on the art of the possible.
A critical aspect with vetting KAQs is data governance that includes agency information policy and analytic leaders. These individuals, engaged up front, have the expertise to ensure appropriate use of data while managing privacy, security, and other risks. Proper vetting allows for better consideration of reuse and a “data as a product” sensibility. With an articulated and prioritized mission requirement there is in most cases a way to address issues that arise. And if not, there are valid, clear and compelling reasons.
Using appropriate transparency to align stakeholder incentives
Many integrated data systems aim to support policy analysis and operational decisions based on understanding performance gaps and risk assessments. In Moving federal enterprise risk management beyond compliance theatre, I highlighted that federal agencies mostly don’t manage risk in a repeatable way. Critical to the success of targeted use cases tackling cultural barriers based on use of a single source of truth. Appropriate transparency, ensuring decision support is responsibly synchronized across all levels of the agency, creating an immutable log of past decisions and enforcing accountability for learning and improvement over time can shift incentives.
Moving away from data calls
Too much data is maintained in spreadsheets, presentations or other documents and correlated via point in time data calls — for example, risk, program and agency performance data. In many cases, programs and agencies use such data to “grade their own homework” with limited immediate and even less temporal data integrity. Key to success is to normalize agencies towards use of authoritative data, including potential use of agency integrated data systems for this data.
Integrate and curate once, reuse many times
If the data governance function is robust and trusted, then integrated and curated data can be reused many times with straightforward data governance decisions. This dramatically lowers the cost of reuse, drives compounding value and strengthens trust in and effectiveness of data governance, setting a virtuous cycle.
Interoperability, vendor lock, and a model for depreciating legacy technology
With the VA COP, avoiding vendor lock, maintaining the ability to integrate with existing data stores and managing total cost of ownership were concerns. VA addressed these issues by demonstrating technology-agnostic, full-stack interoperability across data, metadata and data pipeline transformations. VA demonstrated export of those elements from the technology used for its COP: Palantir Foundry on Amazon Web Services. VA then imported and precisely recreated the curated data objects and meta-data in a second technology stack: Databricks and other technologies on Microsoft Azure. VA demonstrated this on person-centric data objects with about 100 datasets and 900 transformation points, using two distinct technology teams.
Key is establishing a data lake independent of a specific data integration or data analytics solution, and having the chosen solution read and write data, metadata and transformations to it. For example, once a pipeline is established and vetted, it might be substantially cheaper to run and maintain it on a commodity, open technology stack. This approach to data architecture requires close collaboration and alignment among agency data and information technology leaders. It also establishes technology- agnostic control over federal data and its quality and use, including with AI, enabling strategic deprecation of legacy and introduction of new technology. Demonstrating this capability will generate substantial leverage with the vendor community, lowering cost and risk for the government.
Kshemendra Paul is a seasoned leader, recognized for his pioneering results and solutions using data, architecture, and information sharing and safeguarding to improve government on behalf of the American people. He started his career in the private sector as technology leader and entrepreneur. In the second half of his career, he served in a variety of federal agencies and the White House in roles such as assistant inspector general, governmentwide lead for information sharing, federal chief architect, program manager, and chief data officer. His focus now is policy advocacy, including advising leaders and organizations, in his personal capacity.
This document is authored by Kshemendra Paul in his personal capacity. The views expressed are his and not those of the U.S. Government or any of its agencies. The mentions of specific products and vendors are not endorsements and provided for context.