NHGIS geographic crosswalks describe how U.S. census summary data from one year correspond to geographic units for another year. The crosswalks are designed to support high-quality tabulations of one year's data for another year's geographic units (e.g., 1990 foreign-born population for 2010 census tracts).
NHGIS crosswalks are similar to the U.S. Census Bureau's Relationship Files, but NHGIS crosswalks include interpolation weights derived from advanced models. Each interpolation weight indicates the proportion of a source zone's characteristics to allocate to a specific target zone. We use these same weights to produce NHGIS geographically standardized time series tables.
- Available Crosswalks
- Selecting a Crosswalk
- How to Use the Crosswalks
- Technical Details
- Download
- Citation and Use
- References
Available Crosswalks
Source Zones | Target Zones | ||||
---|---|---|---|---|---|
Blocks | Blocks, | X | X | X | X |
Block Group Parts | Block Groups, | X | X | ||
Block Groups | Block Groups, | X | X | ||
Census Tracts | Census Tracts, | X | X | X | X |
Selecting a Crosswalk
Important: Start from the Lowest Level You Can
To transform summary data from one census's geographic units to another's, it's important to start from the lowest possible level.
Let's say you'd like "harmonized tract data": multiple years of data for spatially consistent census tracts. So you obtained 1990, 2000, 2010, and 2020 tract data, and now you'd like to use a tract crosswalk to standardize the geographic units. But there are lots of data available for units smaller than tracts, like blocks and block groups. And these smaller units fit much better within other years' tracts, so why not start with data from the smaller units? If you allocate data from blocks or block groups to a single year's tract boundaries, you'll still produce "harmonized tract data," but it'll be more accurate than if you started with tract data, and in some cases much more.
For example, there are thousands of cases where a single 1990 census tract corresponds to multiple 2010 tracts. This is especially common in fast-growing areas on the outskirts of cities, as in Figure 1. To tabulate 1990 data for the 2010 tracts in these cases, starting from tract counts requires disaggregating counts from the larger 1990 tracts down to their component parts, which can result in substantial errors. Disaggregation models commonly assume that population characteristics are uniformly distributed within source zones, resulting in inaccurately uniform distributions among the target zones. This is evident in the tract-based estimates of the 1990 Black population distribution in Figure 1, which indicate that this area was moderately well integrated.
Figure 1. Computing 1990 rates of Black residents in 2010 tracts from different source levels, on the southwest outskirts of Charlotte, NC. Click for larger version. Source for tract-based estimates: Longitudinal Tract Database (LTDB).
In this case, there is no need to start with tract data. The Census Bureau also published 1990 race data for blocks, block groups, and "block group parts" (intersections between block groups, places, county subdivisions, and some other levels). Allocating data from these levels requires less disaggregation, and in many cases it requires only aggregation, resulting in exact 1990 counts for the 2010 tracts. In Figure 1, the allocations from the lower levels reveal that this area's population was in fact strongly segregated in 1990, with high rates of Black population in the eastern 2010 tracts and low rates in the western 2010 tracts.
Options & Recommendations
Which source level is "lowest" depends on whether the data are from the full-count, short-form census or from a sample-based, long-form survey.
From 1970 to 2000, the decennial census used two questionnaires:
- The short-form questionnaire, to be completed for all population and housing units, covering subjects such as age, sex, race, household size, and housing tenure.
- The long-form questionnaire, sent only to a sample of households, covering subjects such as income, employment, education, nativity, migration, and commuting.
After the 2000 census, the Census Bureau stopped including the long form in decennial census operations and has instead collected comparable information through the year-round American Community Survey (ACS). The Bureau publishes summary tables from the ACS annually. To generate a large enough sample to report data for small areas, it pools 5 years of samples together, producing 5-year summary data.
- Blocks are the lowest level for which the Census Bureau tabulates full-count, short-form summary data
- Block group parts are the lowest level for which the Census Bureau tabulated sample-based, long-form summary data in 1990 and 2000
- Block groups are the lowest level in ACS 5-year summary tables*
- Census tracts are the lowest level for some detailed census and ACS summary tables**
*The ACS also provides data for many entities that are smaller than block groups, such as small villages and tribal areas, but unlike these areas, block groups cover the entire U.S. and are uniformly small, with no great exceptions.
**Sex by single year of age and detailed place of birth are example tabulations for which there is usually no data available for levels lower than census tract. In decennial census summary files, such tables have codes that include "CT" (e.g., "PCT12" for the 2020 Sex by Single-Year Age table). ACS summary tables do not use a distinct code for these cases, but NHGIS distinguishes them by grouping them in separate datasets. (E.g., the tables in the NHGIS 2016_2020_ACS5a dataset provide data for block groups, and the tables in 2016_2020_ACS5b do not.)
For optimal accuracy, we recommend using:
- Crosswalks from blocks for short-form decennial census data
- Crosswalks from block group parts for long-form 1990 and 2000 census data
- Crosswalks from block groups for ACS 5-year data
- Crosswalks from census tracts only for data that are not available for lower levels
How to Use the Crosswalks
Using Block Crosswalks
In a block crosswalk, each record identifies a possible intersection between a single source block and a single target zone, along with an interpolation weight (ranging between 0 and 1) identifying approximately what portion of the source block's population and housing units were located in the intersection. These weights can be used to estimate how any counts available for source blocks (e.g., females age 75 and over, single-member households, owner-occupied housing units, etc.) are distributed among target zones.
For example, to interpolate count data from 1990 blocks to 2010 block groups:
- Obtain data of interest for 1990 blocks
- E.g., using the NHGIS Data Finder, find and download tables for the Block geographic level for the year 1990 (dataset 1990_STF1)
- Join the 1990-block-to-2010-block-group crosswalk to the 1990 block data of interest
- Multiply the 1990 block counts by the crosswalk's interpolation weights, producing estimated counts for all intersections (or "atoms") between 1990 blocks and 2010 block groups
- Sum these atom counts for each 2010 block group
Using Block-to-Block Crosswalks to Generate Data for Any Larger Units
Since 1990, every census geographic unit corresponds exactly to a set of blocks from the same census year. If you have census counts for blocks, you can compute counts for any larger census units (places, county subdivisions, ZIP Code Tabulation Areas, etc.) by summing all the counts for blocks in that unit. In this way, the NHGIS block-to-block crosswalks can be used to allocate data from 1990, 2000 or 2020 to any 2010 census units, or from 2010 to any 2020 census units, not just to blocks.
For example, to compute 2000 counts for 2010 school districts:
- Obtain 2000 block counts of interest and use the 2000-to-2010 block crosswalk to generate 2000 data for 2010 blocks (following steps similar to those above)
- Obtain a 2010 block data table from NHGIS and join it to the 2000 counts from step 1
- You can find codes for most 2010 census units, including school districts, in 2010 block-level table files
- Sum the 2000 counts for all 2010 blocks in each 2010 school district
Using Crosswalks from Block Group Parts, Block Groups, & Census Tracts
The steps for using crosswalks from block group parts (BGPs), block groups, and tracts mirror those for crosswalks from blocks.
The main difference is that crosswalks from block groups and BGPs include separate interpolation weights for different census counts, as laid out in the Technical Details section. To generate the weights in these crosswalks, we compute the proportion of each source zone's characteristics in each target zone based on block-level characteristics, using the block-to-block crosswalks. The separate interpolation weights are based on different block-level characteristics (total population, total housing units, etc.), enabling crosswalk users to choose the weights that are most suitable for their data of interest.
For example, a user interpolating counts of foreign-born persons may choose to use the weights based on total populations, assuming that the distribution of foreign-born persons corresponds closely to the general population's distribution. A user interpolating counts of high-income households may instead choose to use the weights based on total households, etc.
Finding Data for Block Group Parts
Finding source data for 1990 and 2000 BGPs through the NHGIS Data Finder involves an extra step. In NHGIS terminology, BGPs are a type of compound level. To find them in a selection window for Geographic Levels, you will need to click on the "Show Compound Geographic Levels" option in the upper right. You will then find BGP levels under the "BLOCK GROUP" heading.
Summary files for 1990 and 2000 use different versions of BGPs, each combining different sets of geographic units. The crosswalks use these levels:
Year | NHGIS ID | Label |
---|---|---|
1990 | blck_grp_598 | Block Group [1990 partition] (by State--County--County Subdivision--Place/Remainder--Census Tract--Congressional District (1987-1993, 100th-102nd Congress)--American Indian/Alaska Native Area/Remainder--Reservation/Trust Lands/Remainder--Alaska Native Regional Corporation/Remainder--Urbanized Area/Remainder--Urban/Rural) |
2000 | blck_grp_090 | Block Group [2000 & 2010 partition] (by State--County--County Subdivision--Place/Remainder--Census Tract--Urban/Rural) |
Using Crosswalks with ACS Data for Non-Census Years
The Census Bureau generally makes no changes to its definitions of census tracts or block groups between censuses, and changes to county boundaries are also uncommon in recent decades. As such, it should be possible to use the crosswalks for 2010 units with any of the 2010-2019 releases of ACS Summary Files, and to use the crosswalks for 2020 units with any of the 2020-2029 releases of ACS Summary Files.
There are a few exceptions:
- 2010-2019
- Prior to the 2011 ACS data release: Tract numbering corrections and minor geographic definition changes in Madison County (FIPS: 053), Oneida County, (065) and Richmond County (085), New York (36)
- Prior to the 2012 ACS data release: Tract numbering corrections in Pima County (FIPS: 019), Arizona (04), and the restoration of a deleted tract in Los Angeles County (037), California (06)
- In 2013, 2015, and 2019: A few changes to the boundaries and/or codes of counties or county equivalents in Alaska, Virginia, and South Dakota. See the ACS Geography Boundaries by Year to determine which ACS data releases were affected by these changes.
- 2020-2029
- Prior to the 2022 ACS data release: The Census Bureau began using nine Connecticut Planning Regions to report county-level data in Connecticut, replacing Connecticut's historical set of eight counties. This change results in new numbering in the county component of identifiers for all census tracts and block groups in Connecticut.
We hope to extend our crosswalks in the future to incorporate these discrepancies and facilitate allocations to and from census units for any year since 2010 for anywhere in the U.S. Until then, to handle the 2022 change in county codes in Connecticut, we recommend using Geocorr 2022 to obtain a relationship file between 2020 census units and the newly identified Connecticut planning regions.
Technical Details
Basics
The crosswalks are provided through the links below as comma-separated values (CSV) files within Zip archives.
Each Zip file includes a "README" text file (also available here) that describes the content of the crosswalk file in detail.
Methodology
The interpolation weights in NHGIS crosswalks from blocks to blocks are primarily based on "target-density weighting" (TDW) (Schroeder 2007). TDW assumes that characteristics within each source zone have a distribution proportional to the densities of another characteristic among target zones. For example, if a 2020 block intersects two 2010 blocks, one of which was 10 times as dense as the other in 2010, then TDW assumes that the same 10:1 ratio holds within the 2020 block in 2020.
The interpolation weights in the crosswalks from 1990 and 2000 blocks to 2010 blocks involve some more advanced modeling as documented in these pages:
To generate weights in crosswalks from blocks to larger areas, we sum the weights from the block-to-block crosswalks for each pairing of source block and target zone. For example, if a single 1990 block intersects three 2010 blocks, two of which were in a single 2010 tract, then for the 1990-block-to-2010-tract crosswalk, we sum the two weights for the intersections between the 1990 block and the two 2010 blocks in the same 2010 tract.
To generate all other crosswalks, we use block-level data and the block-to-block crosswalks to compute the proportion of each source zone's characteristics that were located in each target zone. For example, to compute the household interpolation weights in crosswalks from 2010 block groups (BGs) to 2020 census tracts, we:
- Apply the weights in the 2010-to-2020-block crosswalk to 2010 block-level household counts to compute counts of 2010 households in all intersections between 2010 and 2020 blocks
- Sum the block-level-intersection counts for each unique combination of 2010 BG ID and 2020 tract ID to compute counts of 2010 households in each intersection between 2010 BGs and 2020 tracts
- Divide each of these "intersection counts" by the total household count for the corresponding 2010 BG to compute the expected proportion of the 2010 BG's households that are located in each 2020 tract
Geographic Coverage
Each crosswalk file is complete for the entire U.S. or for an entire state. State-level files include all target zones for the state as well as any source zones that intersect any of those target zones, including some source zones from neighboring states in cases where the Census Bureau adjusted state boundary lines between censuses.
Users interested in producing a complete set of data for a single state may need to obtain source data for both the state of interest and its neighboring states to ensure they have the required input data to allocate to all target zones in the state.
The block crosswalks and the nationwide BGP crosswalks can include millions of records and may therefore be too large to open in some applications.
Geographic Identifiers
The crosswalk files include two types of zone identifiers:
- GISJOIN
- Standard identifier used in NHGIS data tables and boundary files
- Always begins with a "G" prefix*
- The NHGIS state and county codes in GISJOIN identifiers are based on FIPS codes with one digit added to differentiate historical areas
- For current states and counties, the NHGIS code matches the FIPS code with a "0" appended
- GEOID
- Standard identifier used in recent Census Bureau source files
- May begin with a leading zero, which software applications commonly drop when reading the data*
For specifications of how these identifiers are constructed for different geographic levels and years, see the Zone Identifiers section in the crosswalk README file.
*Leading zeros & storing identifiers as text vs. numbers: By default, many software applications read and store numeric codes as numbers, which drops any leading zeros. E.g., the state FIPS code for Colorado officially consists of two digits, "08", but applications will commonly read and store this as the number 8. The purpose of the "G" prefix in GISJOIN identifiers is to ensure applications store identifiers as text strings.
Preserving leading zeros is important when concatenating codes to create a unique identifier. E.g., to uniquely identify Adams County, Colorado, the standard approach is to concatenate the Colorado code (FIPS "08" or NHGIS "080") with the Adams County code (FIPS "001" or NHGIS "0010"), yielding "08001" (the GEOID) or "G0800010" (the GISJOIN), which is unique among all U.S. counties. If the codes were stored as numbers (e.g., 8 for Colorado and 1 for Adams County), proper concatenation would require additional processing to re-insert leading zeros. Preserving leading zeros is also helpful when parsing concatenated identifiers to extract a single geographic level's code (e.g., to obtain the state code "08" from "08001").
How identifiers are stored is important when joining two data tables. Applications are typically not able to directly match numeric identifiers in one table with text identifiers in another table.
We recommend using GISJOIN identifiers not only for joining across various NHGIS data types (crosswalks, tables, boundary files) but also to help ensure that applications properly store the identifiers as text without dropping leading zeros.
Interpolation Weights
The block crosswalks include a single interpolation weight (labeled "WEIGHT") representing the expected proportion of the source block's population and housing units located in each target zone. These are the weights used for geographically standardized time series tables.
The crosswalks from larger units (block group parts, block groups, and census tracts) include multiple interpolation weights, as discussed above:
Field Name | Description |
---|---|
wt_pop | Expected proportion of source zone's population located in target zone |
wt_adult | Expected proportion of source zone's adult population (18 years and over) located in target zone. Currently provided only in crosswalks from block groups. |
wt_fam | Expected proportion of source zone's families located in target zone. Currently not provided in crosswalks from 2020 block groups. |
wt_hh | Expected proportion of source zone's households located in target zone. Note: household counts are equal to counts of occupied housing units and of householders. |
wt_hu | Expected proportion of source zone's housing units located in target zone |
wt_ownhu | Expected proportion of source zone's owner-occupied housing units located in target zone |
wt_renthu | Expected proportion of source zone's renter-occupied housing units located in target zone |
Blank/Missing Identifiers
In crosswalks with 1990 source zones, the 1990 zone identifier fields may contain blank values. Blank values are given in cases where a 1990 zone lies entirely offshore in coastal or Great Lakes waters. In these cases we are unable to use NHGIS boundary files, which exclude offshore areas, to determine relationships between 1990 and later census zones. (For censuses after 1990, we use block relationship files from the Census Bureau to identify intersections in offshore areas.) It is safe to assume that the target zones that lack a matching 1990 zone contain no 1990 population or housing units. We include the records with blank 1990 identifiers to ensure that all target zones are represented in the file.
Download
Crosswalks from 1990 to 2010
From Blocks – Optimal for short-form 1990 census data
1990 Blocks → 2010 Blocks
1990 Blocks → 2010 Block Groups
1990 Blocks → 2010 Census Tracts
1990 Blocks → 2010 Counties
From Block Group Parts – Optimal for long-form 1990 census data
1990 Block Group Parts → 2010 Block Groups
1990 Block Group Parts → 2010 Census Tracts
1990 Block Group Parts → 2010 Counties
From Census Tracts – Recommended ONLY for data not available at lower levels
1990 Census Tracts → 2010 Census Tracts
1990 Census Tracts → 2010 Counties
Crosswalks from 2000 to 2010
From Blocks – Optimal for short-form 2000 census data
2000 Blocks → 2010 Blocks
2000 Blocks → 2010 Block Groups
2000 Blocks → 2010 Census Tracts
2000 Blocks → 2010 Counties
From Block Group Parts – Optimal for long-form 2000 census data
2000 Block Group Parts → 2010 Block Groups
2000 Block Group Parts → 2010 Census Tracts
2000 Block Group Parts → 2010 Counties
From Census Tracts – Recommended ONLY for data not available at lower levels
2000 Census Tracts → 2010 Census Tracts
2000 Census Tracts → 2010 Counties
Crosswalks from 2010 to 2020
From Blocks – Optimal for 2010 census data
2010 Blocks → 2020 Blocks
2010 Blocks → 2020 Block Groups
2010 Blocks → 2020 Census Tracts
2010 Blocks → 2020 Counties
From Block Groups – Optimal for 2010-2019 ACS 5-year data
2010 Block Groups → 2020 Block Groups
2010 Block Groups → 2020 Census Tracts
2010 Block Groups → 2020 Counties
From Census Tracts – Recommended ONLY for data not available at lower levels
2010 Census Tracts → 2020 Census Tracts
2010 Census Tracts → 2020 Counties
Crosswalks from 2020 to 2010
From Blocks – Optimal for 2020 census data
2020 Blocks → 2010 Blocks
2020 Blocks → 2010 Block Groups
2020 Blocks → 2010 Census Tracts
2020 Blocks → 2010 Counties
From Block Groups – Optimal for most 2020-2029 ACS 5-year data
2020 Block Groups → 2010 Block Groups
2020 Block Groups → 2010 Census Tracts
2020 Block Groups → 2010 Counties
From Census Tracts – Recommended ONLY for data not available at lower levels
2020 Census Tracts → 2010 Census Tracts
2020 Census Tracts → 2010 Counties
Citation and Use
Use of NHGIS crosswalks is subject to the same conditions as for all NHGIS data. See Citation and Use of NHGIS Data.
References
- ^ Schroeder, J. P. (2007). "Target-density weighting interpolation and uncertainty evaluation for temporal analysis of census data." Geographical Analysis 39(3), 311–335. http://dx.doi.org/10.1111/j.1538-4632.2007.00706.x