Overview
NHGIS annual census tract estimates provide estimates from 2000-2019 for the population broken down by sex, age, and race. The estimates use 2010-vintage census tracts as the reporting unit.
Users may need census tract-level population estimates for non-decennial years. They may have disease counts, school enrollment data, or other numerators for census tracts, and they require an appropriate denominator to calculate rates. These annual population estimates fulfill such requirements for the 2000-2019 time period.
We provide annual estimates of 2010-vintage census tracts for all unique combinations of sex (two categories), race (six categories), and age (18 categories).
Sex, Age, and Race Categories
Sex | ||
---|---|---|
Female | Male | |
Race | ||
White alone | Black or African American alone | American Indian or Alaska Native alone |
Asian alone | Native Hawaiian or Pacific Islander alone | Some other race alone |
Two or more races | ||
Age | ||
0-4 | 5-9 | 10-14 |
20-24 | 25-29 | 30-34 |
35-39 | 40-44 | 45-49 |
50-54 | 55-59 | 60-64 |
65-69 | 70-74 | 75-79 |
80-84 | 85 and older |
Technical Details
Basics
The tract estimates are provided through the links below as comma-separated values (CSV) files within Zip archives.
Each Zip file includes a codebook that describes the contents of the data files.
Methodology
The annual census tract estimates are primarily based on the methodology described in Swanson (2010) and Strate et al. (2016).
Data sources
We integrate several data sources to create the annual census tract estimates.
Decennial Census of Population and Housing
The 2000 and 2010 Summary File (SF) 1 datasets provide sex by age by race counts for census blocks (2000), census tracts (2010), and counties (2000 and 2010).
Population Estimates
The US Census Bureau's Population Estimates Program publishes annual county-level estimates of the population for sex by age by race. The reference date for each year's estimate is July 1. We use the county-level 2000-2010 intercensal estimates and the Vintage 2019 estimates in our processing pipeline. The Vintage 2019 estimates cover the 2010-2019 time period.
Modified Race Summary File
The Modified Race Summary File (MRSF) is produced by the US Census Bureau's Population Estimates program and serves as the basis for its annual sex by age by race estimates. We use the 2000 and 2010 MRSF files in our processing pipeline. The MRSF is derived from Summary File 1 counts, and the key difference between these products are their race categories.
Race Categories in the Decennial and Modified Race Summary Files
Decennial | Modified Race Summary File |
---|---|
White alone | White alone |
Black or African American alone | Black or African American alone |
American Indian or Alaska Native alone | American Indian or Alaska Native alone |
Asian alone | Asian alone |
Native Hawaiian or Other Pacific Islander alone | Native Hawaiian or Other Pacific Islander alone |
Some other race alone | Not available |
Two or more races | Two or more races |
Summary File 1 (and all other decennial Summary Files) include a "some other race" category when reporting counts of persons by race, but this category is not included in the MRSF or annual population estimates. The MRSF and annual population estimates use the Office of Management and Budget (OMB) standard race and ethnicity categories. "Some other race" is not a standard OMB category but is used by the Census Bureau on the decennial questionnaires. The MRSF is created by allocating all persons who identify as "some other race" to one of the OMB standard race categories or to the "two or more race" category. In producing the MRSF, all respondents in SF1 who had indicated both a specific race in addition to “some other race” were allocated to their listed specific race category. Respondents in SF1 who had indicated “some other race” alone were allocated using a donor within their household that matched on Hispanic origin. If no such donor was available, a hot deck matrix where donor and donee matched on Hispanic origin was used to impute a reallocated race for the respondent. More details can be found in the MRSF methodology statement
NHGIS Geographic Crosswalks
Census tract boundaries frequently change from decade to decade. In order to standardize data collected for the 2000 decennial census onto the units used in the 2010 census, we use the 2000-2010 census block geographic crosswalks developed by NHGIS.
Race Category Reallocation
We begin by standardizing the race categories in the 2000 and 2010 Summary File 1 (SF1) datasets to the categories in the Modified Race Summary File (MRSF) and population estimates products. We must reallocate the SF1 "some other race" counts for 2010 census tracts and 2000 blocks among the other six categories. We compare the county-level counts in the SF1 data to the counts in the MRSF for each sex, age, and race group. The MRSF counts for each of its race categories reflect the original counts plus the number of persons identifying as "some other race" who were reclassified to one of the above categories in the creation of the MRSF. Our reallocation approach differs slightly for the 2010 and 2000 data.
2010 SF1 Data
For each tract within a county, we reallocate all "some other race" persons for a given age and sex listed in that tract's SF1 data to each of the MRSF race categories in the same proportion as was present for each race (for that age and sex) for the county to which that tract belongs.
For example, if 50 "some other race" persons (for a given age and sex) were recorded in the county-level decennial data and the MRSF counts show an increase in 20 people listed as White alone, 20 people listed as Black alone, and 10 people listed as Asian alone, then 40% of each tract's "some other race" count would be reallocated to that tract's White alone and Black alone counts, and the remaining 20% would be reallocated to that tract's Asian alone count.
In general, the count of persons listed as "two or more races" in the decennial data is equal to or larger than the count of persons listed as "two or more races" in the MRSF. In these cases, we treat the "two or more races" category similarly to the single-race categories, reallocating "some other race" counts to it proportional to the increase in the "two or more races" count at the county level.
However, in some cases, the count for the "two or more races" category is smaller in the MRSF than in SF1 (in addition to "some other race" records, "two or more races" records may be reclassified to single-race categories in the process of making the MRSF file). In these cases, there are more individuals that need to be reallocated than are listed solely in the "some other race" category. Yet, we cannot exactly determine how many people need to be reallocated, since the counts in the MRSF's "two or more races" category reflect both reclassifications of multi-race individuals to single-race categories as well as the reclassification of "some other race" individuals to the multi-race category.
In these cases, for each age and sex, we first reduce each tract's multi-race population in proportion to the decrease in multi-race population at the county level between the SF1 and MRSF data. The amount of decrease in the multi-race population count is then added to the SF1's "some other race" count to determine the tract's final count for reallocation. The reallocation process then proceeds as described above, except that the final reallocation count is distributed solely among single-race categories, not the multi-race category (which has already been adjusted).
2000 SF1 Data
The 2000 SF1 data require the same processing as above, but also need to be standardized to 2010 census tract boundaries. Therefore, we first reallocate the 2000 SF1 race categories at the block, rather than tract level. We still use reallocation population proportions from the county containing each block.
We then convert 2000 block counts onto 2010 blocks using the 2000-2010 block-to-block crosswalk provided by NHGIS. We then aggregate these counts from the 2010 blocks to 2010 tracts to obtain reallocated 2000 data for 2010 tract boundaries.
Population Projections
Producing annual estimates requires interpolating between starting and ending points. We currently provide data up to 2019, as this is the most recent year for which we have county-level sex by age by race population estimates. Unfortunately, we do not have tract-level data for 2019; therefore, we must generate projected 2019 population counts for sex by age by race at the tract level.
We follow the Hamilton-Perry method described in Swanson (2010) to generate these projections. These methods are designed to produce estimates in 10 year increments. Because we use 2010 decennial data as an input, this corresponds to projected counts for 2020. Therefore, while we only distribute annual estimates up to 2019, we first generate projections up to 2020 before truncating the resulting time series.
Cohort Change Ratios (CCRs)
We use 2000 and 2010 SF1 counts, standardized to 2010 tracts, to generate cohort change ratios (CCRs) for each sex by age by race group. CCRs are the ratio of the population of a given sex by age by race group in a given decennial census to the population of those in the same sex and race group who are in the 10-years-younger age group in the previous decennial census.
In cases where counts are too small to enable the calculation of a CCR (CCRs are undefined when any cohort's starting population in 2000 is 0), we aggregate across race categories to the tract by sex by age level and calculate CCRs for each tract by sex by age cohort. For cases where CCRs are still undefined, we aggregate further to the the county by sex by age level before calculating CCRs. There are a small number of remaining cases where CCRs are still undefined even at the county by sex by age level (e.g., Loving County, Texas, and Kalawao County, Hawaii). In these cases, we artificially set the undefined CCRs to 1.0.
CCRs for small-population areas may be highly variable. Following Swanson's recommendation, we place a floor of 0.82 (2% annual reduction rate over 10 years) and a ceiling of 1.63 (5% annual growth rate over 10 years) on all CCRs. Any CCRs lower than the floor value or higher than the ceiling value are replaced with the floor or ceiling value, respectively.
Child-to-Women Ratios (CTWs)
CCRs are not defined for age groups under 10. In these cases, we calculate the ratio of the 0-5 year-old and 5-10 year-old counts to the females of child-bearing age counts at the tract by sex by race level. These child-to-woman (CTW) ratios are multiplied by the projected 2020 females of child-bearing age population (obtained using CCRs) to generate a projection for the 2020 population of the 0-5 and 5-10 age groups.
For projecting the population of children aged 0-4, we use the count of all females of the given race between the ages of 20-45. For projecting the population of children aged 5-10, we use the count of all females of the given race between the ages of 30-50. While mothers obviously do not need to be the same race as children, this restriction was necessary to enable the calculation of approximate rates at the sex by age by race level.
In cases where the CTW is undefined (because there are no females of child-bearing age for a given tract and race), we use the CTWs calculated for tract by sex by age groups or, if still undefined, county by sex by age groups.
We combine the 2020 population projections obtained via CCR and CTW into a final dataset for use in our annual census tract estimate interpolation procedure.
Annual Census Tract Estimate Interpolation
Finally, we generate annual population estimates at the tract level by interpolating between either the modified 2000 and 2010 SF1 counts or the modified 2010 SF1 and projected 2020 counts for each sex by age by race group.
For 2000-2010, we linearly interpolate between the 2000 and 2010 count for each tract by sex by age by race combination, accounting for the fact that annual estimates' reference date is July 1 while the reference date for decennial counts is April 1.
The interpolated values for all tracts in a given county will not necessarily sum to the annual estimate for that county. Therefore, we adjust all tracts within a county upward or downward to ensure that county-level counts are consistent with the counts of the tracts contained within the county. This adjustment also causes tract-level estimates to follow the general trajectory of the county-level trend.
In some cases, the annual estimates for a county suggest that some persons of a given sex, age, and race group were counted in that county, but no tracts within that county have any counts for that group after interpolation. In these cases, we allocate the county-level counts for a given sex, age, and race group to the tracts relative to the distribution of the total population of that race (disregarding sex and age) across the county's tracts.
In the cases where no persons of a given race were recorded for any sex and age groups, we use the total population of each tract within the county to re-distribute the recorded county-level count.
For 2010-2019, we use the projected 2020 population as our endpoint for the linear interpolation. We then truncate the interpolated time series to 2019 before proceeding as described above.
Decennial counts are not included in the output files, even in years where both a decennial and annual estimate are recorded. For instance, data listed as originating from 2000 represent the estimated tract-level annual estimate for July 1, 2000, not the count recorded in the 2000 decennial census. The reference date for the 2000 decennial census is April 1, 2000.
We combine the 2000 to 2010 and 2010 to 2019 time series for each tract into a single time series spanning 2000 to 2019.
Geographic Coverage
Each tract estimate file includes all census tracts in a particular state. To make the files more useful, we currently provide state-level CSV files.
Geographic Identifiers
The files contain two unique census tract identifiers:
- GISJOIN identifiers match the identifiers used in NHGIS data tables and boundary files. A census tract GISJOIN concatenates these codes:
Component | Notes |
---|---|
"G" prefix | This prevents applications from automatically reading the identifier as a number and, in effect, dropping important leading zeros |
State NHGIS code | 3 digits (FIPS + "0"). NHGIS adds a zero to state FIPS codes to differentiate current states from historical territories. |
County NHGIS code | 4 digits (FIPS + "0"). NHGIS adds a zero to county FIPS codes to differentiate current counties from historical counties. |
Census tract code | 6 digits for 2000 and 2010 tracts |
- GEOID identifiers correspond to the codes used in most current Census sources (American FactFinder, TIGER/Line, Relationship Files, etc.). A census tract GEOID concatenates these codes:
Component | Notes |
---|---|
State FIPS code | 2 digits |
County FIPS code | 3 digits |
Census tract code | 6 digits |
Download
The annual tract estimates are available to registered NHGIS users through the links below.
API Access: Users who would like to access the tract estimates directly from within a programming environment (R, Python, etc.) may use the IPUMS API. The IPUMS Developer Portal provides complete details on the IPUMS API. The API for IPUMS NHGIS page describes which NHGIS supplemental data resources are available through the API (including annual tract estimates) and identifies how to construct a valid API URL for these resources. The Workflows & Code pages include some example code for accessing NHGIS supplemental data.
Tract Estimates by State | ||
---|---|---|
Citation and Use
Use of the NHGIS tract estimates is subject to the same conditions as for all NHGIS data. See NHGIS Citation and Use.
References
- ^ Swanson, D. A., Schlottmann, A., & Schmidt, B. (2010). "Forecasting the population of census tracts by age and sex: An example of the Hamilton–Perry method in action." Population Research and Policy Review 29(1), 47–63.
- ^ Strate, S., Renski, H., Peake, T., Murphy, J.J., & Zaldonis, P. (2016). "Small Area Population Estimates for 2011 through 2020." UMass Donahue Institute.