IPUMS NHGIS environmental summaries provide land cover data summarized over census tracts, county subdivisions, counties, and places, and climate data summarized over counties.
Overview
The NHGIS environmental summary data files provide land cover data from the National Land Cover Database (NLCD) summarized over census tracts, county subdivisions, counties and places, and climate data from the PRISM Climate Group summarized over counties. This section describes the geospatial processing used to generate the environmental summaries.
Land Cover
Land cover describes the visible features that cover the Earth's surface, and land cover data are usually derived from satellite imagery or aerial photography. Each pixel on an image will be assigned to a land cover class (e.g., water, vegetation, bare rock, wetlands) through visual interpretation or, more commonly, automated classification algorithms. Researchers may then use the data to quantify the amount of land covered by a particular class(es) or to track changes in land cover over time.
NHGIS land cover summaries use data from the National Land Cover Database, which is produced by a consortium of federal agencies. The NLCD data have a 30-meter spatial resolution and were derived from a decision-tree classification of Landsat satellite imagery. NHGIS provides environmental summaries from the 2001, 2004, 2006, 2008, 2011, 2013, 2016, 2019 and 2021 versions of the NLCD. These nine NLCD versions, collectively known as the legacy NLCD datasets, were created using a consistent methodology and are comparable over time.
The NLCD data contains sixteen land cover classes:
Class value | Class name | Description |
---|---|---|
11 | Open water | Areas of open water, generally with less than 25% cover of vegetation or soil. |
12 | Perennial ice/snow | Areas characterized by a perennial cover of ice and/or snow, generally greater than 25% of total cover. |
21 | Developed, open space | Areas with a mixture of some constructed materials, but mostly vegetation in the form of lawn grasses. Impervious surfaces account for less than 20% of total cover. These areas most commonly include large-lot single-family housing units, parks, golf courses, and vegetation planted in developed settings for recreation, erosion control, or aesthetic purposes. |
22 | Developed, low intensity | Areas with a mixture of constructed materials and vegetation. Impervious surfaces account for 20% to 49% of total cover. These areas most commonly include single-family housing units. |
23 | Developed, medium intensity | Areas with a mixture of constructed materials and vegetation. Impervious surfaces account for 50% to 79% of the total cover. These areas most commonly include single-family housing units. |
24 | Developed, high intensity | Highly developed areas where people reside or work in high numbers. Examples include apartment complexes, row houses and commercial/industrial. Impervious surfaces account for 80% to 100% of the total cover. |
31 | Barren land (rock/sand/clay) | Areas of bedrock, desert pavement, scarps, talus, slides, volcanic material, glacial debris, sand dunes, strip mines, gravel pits and other accumulations of earthen material. Generally, vegetation accounts for less than 15% of total cover. |
41 | Deciduous forest | Areas dominated by trees generally greater than 5 meters tall, and greater than 20% of total vegetation cover. More than 75% of the tree species shed foliage simultaneously in response to seasonal change. |
42 | Evergreen forest | Areas dominated by trees generally greater than 5 meters tall, and greater than 20% of total vegetation cover. More than 75% of the tree species maintain their leaves all year. Canopy is never without green foliage. |
43 | Mixed forest | Areas dominated by trees generally greater than 5 meters tall, and greater than 20% of total vegetation cover. Neither deciduous nor evergreen species are greater than 75% of total tree cover. |
52 | Shrub/scrub | Areas dominated by shrubs; less than 5 meters tall with shrub canopy typically greater than 20% of total vegetation. This class includes true shrubs, young trees in an early successional stage or trees stunted from environmental conditions. |
71 | Grassland/herbaceous | Areas dominated by gramanoid or herbaceous vegetation, generally greater than 80% of total vegetation. These areas are not subject to intensive management such as tilling, but can be utilized for grazing. |
81 | Pasture/hay | Areas of grasses, legumes, or grass-legume mixtures planted for livestock grazing or the production of seed or hay crops, typically on a perennial cycle. Pasture/hay vegetation accounts for greater than 20% of total vegetation. |
82 | Cultivated crops | Areas used for the production of annual crops, such as corn, soybeans, vegetables, tobacco, and cotton, and also perennial woody crops such as orchards and vineyards. Crop vegetation accounts for greater than 20% of total vegetation. This class also includes all land being actively tilled. |
90 | Woody wetlands | Areas where forest or shrubland vegetation accounts for greater than 20% of vegetative cover and the soil or substrate is periodically saturated with or covered with water. |
95 | Emergent herbaceous wetlands | Areas where perennial herbaceous vegetation accounts for greater than 80% of vegetative cover and the soil or substrate is periodically saturated with or covered with water. |
NHGIS used the following process to create the land cover summaries:
- Download the 2001, 2004, 2006, 2008, 2011, 2013, 2016, 2019 and 2021 versions of the NLCD as georeferenced raster files.
- Download the IPUMS NHGIS GIS file(s) for the geographic units for which environmental summaries will be calculated.
- Calculate the proportion of each geographic unit's area covered by a given land cover class using functions from the sf and exactextractr packages in R.
- Repeat steps 2 and 3 for each geographic level.
NHGIS has produced 2001, 2004, 2006, 2008, 2011, 2013, 2016, 2019 and 2021 land cover summaries for the contiguous United States for these areas:
Vintage | Geographic units | TIGER/Line version |
---|---|---|
2010 | Census Tracts, County Subdivisions, Counties, Places | 2020 TIGER/Line Shapefiles |
2020 | Census Tracts, County Subdivisions, Counties, Places | 2020 TIGER/Line Shapefiles |
2022 | Census Tracts, County Subdivisions, Counties, Places | 2022 TIGER/Line Shapefiles |
We created environmental summaries for 2022-vintage geographic units because 2022 was the year the Census Bureau began identifying planning regions in Connecticut in place of Connecticut's historical counties, which have long had no official administrative function. These new planning regions changed the GISJOIN / GEOID code for census tracts, counties, and county subdivisions. While the new planning regions do not impact the codes for places, we include 2022 places for completeness.
Climate
Climate data describe the weather conditions for a given area over a long time period. These data are often published as raster files where each pixel contains an estimate of temperature or precipitation for a given time period derived from data collected at nearby monitoring stations. Scientists use modeling techniques to convert station-level data to raster datasets.
NHGIS climate summaries uses data from the PRISM Climate Group at Oregon State University and the Northwest Alliance for Computational Science and Engineering. The PRISM data have a 4-kilometer spatial resolution and daily, monthly, annual, and 30-year normal temporal resolutions. Available climate variables include precipitation; minimum, maximum, and mean temperature; mean dew point; minimum and maximum vapor pressure deficit; and vapor pressure.
PRISM temporal resolutions and time periods summarized in NHGIS files
Temporal resolution | Availability | Description |
---|---|---|
Monthly | 1895-2014 | For each pixel, PRISM reports the total amount of precipitation received during the month and the average maximum and minimum daily temperatures for the month. PRISM takes a pixel's maxiumum and minimum daily temperatures and averages them over the entire month. |
Annually | 1895-2014 | For each pixel, PRISM reports the total amount of precipitation received during the year and the average maximum and minimum daily temperatures for the year. PRISM takes a pixel's maxiumum and minimum daily temperatures and averages them over the entire year. |
30-year normals | 1981-2010 | This resolution describes the average monthly and annual climatic conditions over a 30-year time period. There are 12 time steps in the 30-year normal - one for each month (January - December). |
The 30-year normals resolution may be unfamiliar to some users, so we want to break it down in more detail. Normals are baseline datasets describing the long-run average climatic conditions for an area. There are typically 13 times steps in a normals dataset: one for each month and one annual. For example, to create a January 30-year normal maximum temperature, PRISM computes the average of the 30 January monthly average maximum temperatures. To create a July 30-year normal precipitation, PRISM computes the average of the 30 July monthly total precipitation measurements. Finally, to compute an annual normal precipitation, PRISM computes the average of 30 yearly total precipitation measurements.
PRISM climate variables summarized in NHGIS files
Variable | Units | Description |
---|---|---|
Precipitation | Millimeters | Total precipitation (rain + melted snow) received by a pixel in a month or year |
Minimum temperature | Degrees Celsius | Average daily minimum temperature in a pixel over a month or year. |
Maximum temperature | Degrees Celsius | Average daily maximum temperature in a pixel over month or year |
Mean temperature | Degrees Celsius | Mean of the minimum and maximum temperature variables |
Process NHGIS used to create the climate summaries
- Download monthly and 30-year normal PRISM climate data as georeferenced raster files.
- Resample the original PRISM raster files from 4-kilometer to 2-kilometer spatial resolution so that every county in the contiguous U.S. had at least one pixel fall inside its boundary.
- For each resampled PRISM raster file, execute the ArcGIS 10.4.1 Zonal Statistics As Table tool with two input datasets: the raster file and an NHGIS boundary file representing all units of a census geographic level as polygons. For each polygon, the tool calculates summary statistics (minimum, maximum, mean, standard deviation) from the pixels that fall inside the polygon.
Areas covered in NHGIS climate summaries
- 2015 counties (excluding Alaska, Hawaii, and Puerto Rico)
Summary statistics provided for each climate variable for each county
- MIN - this statistic is based on the county's pixel that has the smallest value for a climate variable. For example, the minimum precipitation value in December 2014 in Autauga County, Alabama, is 120.5 millimeters. One pixel in Autauga had that estimated amount of precipitation in December 2014.
- MAX - this statistic is based on the county's pixel that has the largest value for a climate variable. For example, the maximum precipitation value in December 2014 in Autauga County, Alabama, is 162.1 millimeters. One pixel in Autauga had that estimated amount precipitation in December 2014.
- MEAN - this statistic is the average of all pixels in a county for a climate variable. For example, the average precipitation value in December 2014 in Autauga County, Alabama, is 132.7 millimeters. The mean is computed by summing the precipitation value of all pixels in Autauga County for that month and dividing by the number of pixels in the county.
- STD - this statistic is the standard deviation for a climate variable among all pixels in a county. For example, the standard deviation for precipitation in December 2014 in Autauga County, Alabama, is 9.2 millimeters.
Technical Details
The environmental summaries are provided through the links below as comma-separated values (CSV) files within Zip archives.
Each Zip file includes a "README" text file that describes the contents of the data files.
Environmental summaries can be downloaded in any of three layouts:
- Time varies by column: Data for different times are placed in separate columns within one file. The rows correspond to geographic units, and the columns correspond to particular time - land cover combinations. E.g., one column reports deciduous forest area in 2001 and another column reports deciduous forest area in 2006.
- Time varies by row: Data for different times are placed in separate rows within one file. Each row represents a geographic unit - time combination (e.g., Autauga County, Alabama - December 2014 mean precipitation). Each column corresponds to a environmental data summary.
- Time varies by file: Data for different times are placed in different files. Within each file, the rows correspond to geographic units, and each column corresponds to an environmental summary at a single time.
Land cover summaries are available in the time varies by column and time varies by file layouts. Climate summaries are available only in the time varies by row layout.
Geographic Identifiers
The land cover files include GISJOIN and GEOID zone identifiers, and the climate files include the GISJOIN zone identifier.
- GISJOIN
- Standard identifier used in NHGIS data tables and boundary files
- Always begins with a "G" prefix*
- The NHGIS state and county codes in GISJOIN identifiers are based on FIPS codes with one digit added to differentiate historical areas
- For current states and counties, the NHGIS code matches the FIPS code with a "0" appended
- GEOID
- Standard identifier used in recent Census Bureau source files
- May begin with a leading zero, which software applications commonly drop when reading the data*
For specifications of how these identifiers are constructed for different geographic levels and years, see the Zone Identifiers section in the README file.
*Leading zeros & storing identifiers as text vs. numbers: By default, many software applications read and store numeric codes as numbers, which drops any leading zeros. E.g., the state FIPS code for Colorado officially consists of two digits, "08", but applications will commonly read and store this as the number 8. The purpose of the "G" prefix in GISJOIN identifiers is to ensure applications store identifiers as text strings.
Preserving leading zeros is important when concatenating codes to create a unique identifier. E.g., to uniquely identify Adams County, Colorado, the standard approach is to concatenate the Colorado code (FIPS "08" or NHGIS "080") with the Adams County code (FIPS "001" or NHGIS "0010"), yielding "08001" (the GEOID) or "G0800010" (the GISJOIN), which is unique among all U.S. counties. If the codes were stored as numbers (e.g., 8 for Colorado and 1 for Adams County), proper concatenation would require additional processing to re-insert leading zeros. Preserving leading zeros is also helpful when parsing concatenated identifiers to extract a single geographic level's code (e.g., to obtain the state code "08" from "08001").
How identifiers are stored is important when joining two data tables. Applications are typically not able to directly match numeric identifiers in one table with text identifiers in another table.
We recommend using GISJOIN identifiers not only for joining across various NHGIS data types (crosswalks, tables, boundary files) but also to help ensure that applications properly store the identifiers as text without dropping leading zeros.
Download
Land cover summaries:
Climate summaries (available only in time varies by row layout):
- Precipitation:
- Maximum temperature:
- Mean temperature:
Citation and Use
Use of the NHGIS environmental data summaries is subject to the same conditions as for all NHGIS data. See NHGIS Citation and Use.
Previous Versions
We previously released land cover summaries of the 2001, 2006, and 2011 NLCD for 2000, 2010, and 2015-vintage counties and census tracts. Those land cover summaries are available below:
- Time varies by column:
- Time varies by row:
- Time varies by file: