Total Population

2020 projection of the US population grouped by Zip3, gender, and age range using prior years census data

Data Acquisition

ZIP source data was acquired from: https://www.kaggle.com/census/us-population-by-zip-code/data. This dataset contains 1,622,832 rows each one listing the population, age range, gender and zip5. We discarded the rows listing the total population by zip5 and gender and summed the total population of zip5 . Census source data from 2011 through 2017 was gathered from: https://factfinder.census.gov/

Extraction, Transformations & Projections

The process was performed using Python and the libraries Pandas and Scikit-Learn. The source data was extracted from each file and transformed to obtain a table with the following columns: Zip3, Gender, Age range, Population 2010, Population 2011,…,Population 2017.

The populations of each Zip5 (by gender and age range) were summed to obtain the population at the Zip3 level. Linear regression with rectification was applied on the populations 2010-2017 to project the population for 2020 for each row.