CurrentPopulationSurvey.jl allows users to easily download & parse U.S. Census Bureau CPS microdata files for the 2007 - present time period (earlier years are coming in future releases). This package supports the Tables.jl interface so you can easily convert to a tabular structure of your preference (e.g. DataFrame
).
- About the CPS: https://www.census.gov/programs-surveys/cps/about.html
- Files and data dictionaries: https://www.census.gov/data/datasets/time-series/demo/cps/cps-basic.html
I recommend that you familiarize yourself with the variables in the data dictionaries before calling cpsdata
so that you can decide on a subset of the total available variables for parsing. One year's worth of data is roughly 5GB - 7GB so narrowing this down (by selecting only the variables that you need) will improve efficiency when working with the data.
This package exports a single function cpsdata
:
cpsdata(year::Int, month::Int[, vars::Vector{String}])
Download/parse CPS microdata files for a given year & month, optionally retaining only the variables specified. There are hundreds of variables so specifying only those that you need will significantly increase efficiency when working with the data.
year::Int
: the year for which you want to obtain CPS data.month::Int
: the month for which you want to obtain CPS data.vars::Vector{String}
: an optional argument specifying the variables in the microdata file that you would like to keep.
data1901 = cpsdata(2019, 1, ["HRINTSTA", "PWORWGT"])
If you want to work with the data as a DataFrame:
using DataFrames
data1901 = DataFrame(cpsdata(2019, 1, ["HRINTSTA", "PWORWGT"]))