Privacy-Protected 2010 Census Demonstration Data
The IPUMS NHGIS Privacy-Protected Demonstration Data link together two versions of 2010 Census summary tables:
- Original tables from the 2010 Census Summary Files
- New tables derived from a trial run of the Census Bureau's 2020 Disclosure Avoidance System (DAS) applied to the original 2010 Census responses
The Census Bureau is experimenting with a major change in its framework for protecting privacy with the 2020 DAS. The Privacy-Protected Demonstration Data facilitate assessments of the new framework, enabling a broad range of users to investigate and provide feedback about the quality of data produced by different DAS versions before the Census Bureau makes final design decisions for publishing 2020 data.
- Feedback and Questions
- Technical Details
- Citation and Use
To protect the confidentiality of 2020 Census respondents, the U.S. Census Bureau plans to use a framework termed "differential privacy". In October 2019, the Census Bureau released a demonstration data product to help users assess the impact of differential privacy on the utility and accuracy of decennial census data. This product was a differentially private version of the 2010 Decennial Census. Several assessments of the demonstration data were presented at the Workshop on 2020 Data Products (December 11-12, 2019) organized by the Committee on National Statistics. These assessments identified limitations in the differentially private data, particularly for low-population geographic units, for which there are no other sources of complete, reliable population data. Workshop participants urged the Bureau to release additional demonstration data as they work to improve utility by refining the differentially private algorithm.
In June 2020, the Census Bureau announced plans to release a Privacy-Protected Microdata File (PPMF) after each programming sprint for which the Bureau generates a corresponding set of quality metrics. The Bureau is continually modifying its differentially private algorithm, and each version of the PPMF will reflect those modifications. Data users may use the PPMF to track changes in accuracy and utility. To make these data more user-friendly, IPUMS NHGIS is creating a Privacy-Protected Summary File (PPSF) from each version of the PPMF. Our PPSF consists of tabulations where each row represents a geographic unit and each column represents a summary statistic (e.g., the count of females age 0-4).
To facilitate comparisons, we link comparable data from the PPSF and original 2010 Census Summary File 1. These linked files comprise the IPUMS NHGIS Privacy-Protected 2010 Census Demonstration Data product.
Feedback and Questions
You may also direct any comments or questions about these files to email@example.com.
- The data for different geographic summary levels are in separate files
- The data files include standard NHGIS "GISJOIN" identifiers and NHGIS variable codes for both the original and privacy-protected data
- The data are stored in CSV (comma-separated values) files within ZIP archives
- The ZIP archives include human-readable codebooks describing the contents of the data files
- The data files use a "wide" record layout, with each data variable in a separate column
- Data for census blocks are in separate files for each state or state equivalent
- Geographic levels: 18 commonly-used levels, including census blocks
- Tables: 22 tables
- P1. Total Population
- P3. Race [7 categories]
- P4. Hispanic or Latino Origin
- P5. Hispanic or Latino Origin by Race
- P6. Race (Total Races Tallied)
- P7. Hispanic or Latino Origin by Race (Total Races Tallied)
- P8. Race [63 categories]
- P9. Hispanic or Latino, and Not Hispanic or Latino by Race
- P10. Race [63 categories] for the Population 18 years and Over
- P11. Hispanic or Latino, and Not Hispanic or Latino by Race for the Population 18 years and Over
- P12. Sex by Age
- P14. Sex by Age for the Population Under 20 Years
- P42. Group Quarters Population by Group Quarters Type
- P12A. Sex by Age (White Alone)
- P12B. Sex by Age (Black or African American Alone)
- P12C. Sex by Age (American Indian and Alaska Native Alone)
- P12D. Sex by Age (Asian Alone)
- P12E. Sex by Age (Native Hawaiian and Other Pacific Islander Alone)
- P12F. Sex by Age (Some Other Race Alone)
- P12G. Sex by Age (Two or More Races)
- P12H. Sex by Age (Hispanic or Latino Origin)
- P12I. Sex by Age (White Alone, Not Hispanic or Latino)
- We plan to add more tables and possibly more levels in future versions of the files. If you have a specific request, please email it to firstname.lastname@example.org.
We derive this version of the Privacy-Protected 2010 Census Demonstration Data from the 20200527 vintage of the PPMF. This file was produced by a new version of the Bureau's differentialy private TopDown Algorithm (TDA). Instead of post-processing all noisy measurements at one time, the new version of TDA employs a multipass solution. For this vintage, the first pass processed total population counts and relationship to householder or residence in a type of group quarters. The second pass processed counts required for the PL 94-171 redistricting dataset. The third pass processed counts required for the Population Estimates program, and the final pass processed all remaining counts. In the multipass version of TDA, output from each pass is constrained to the counts from prior passes. For example, if we sum the counts from the 63 race categories in pass two, the sum will equal the total population count generated in pass one.
Due to the particular circumstances of programming Sprint II, on which the 20200527 vintage is based, there is no housing unit Privacy-Protected Microdata File in this release and hence no housing tables. It is expected that there will be housing data in subsequent releases, as there were in the first demonstration data products.
Vintage 20200527: Parameters
The privacy loss budget assigned to person-level counts in the 20200527 vintage was 4.0, which was allocated to geographic levels and queries as follows:
|GEOGRAPHIC LEVEL||Allocation fraction|
|Relationship to Householder or Residence in Group Quarters||0.15|
|Voting Age * Hispanic * Race||0.29|
|Age * Sex * Hispanic * Race||0.25|
Vintage 20200527: Data Files
Citation and Use
Use of the IPUMS NHGIS Privacy-Protected Demonstration Data is subject to the same conditions as for all NHGIS data:
- You will not redistribute the data without permission.
- You will cite the source appropriately.
In publications or research reports, we request a product-specific citation following this general form:
David Van Riper, Tracy Kugler, and Jonathan Schroeder. IPUMS NHGIS Privacy-Protected 2010 Census Demonstration Data, version YYYYMMDD [Database]. Minneapolis, MN: IPUMS. 2020.
... with YYYYMMDD replaced by the data vintage, corresponding to the PPMF version published by the Census Bureau. A complete recommended citation is also provided in the codebooks that accompany the data files.
Our tabulation of the Census Bureau's Privacy-Protected Microdata Files and the construction of the Privacy-Protected Demonstration Data are supported, in part, by funding from the Sloan Foundation (G-2019-12589). The Minnesota Population Center and IPUMS NHGIS have also provided key resources and support, with funding from the Eunice Kennedy Shriver National Institute of Child Health and Human Development and the National Science Foundation.