National statistical offices are actively integrating data science
methods into many aspects of producing official statistics, including
data collection, editing and imputation, and survey estimation. As
technological and statistical advances provide new data sources and
modeling techniques, procedures must adapt to accommodate them. This
short course will provide participants with examples of how data science
methods are used in producing official statistics and the challenges
faced. It will also introduce participants to a model-assisted approach
for incorporating machine learning into survey estimation. The machine
learning models will include generalized linear models, regularized
(elastic net) regression, and regression trees. The course will also
include demonstrations of how to fit these estimators using the
statistical software R
. R
Markdown files with
the relevant code will be provided so participants can actively follow
along with the demonstrations. Prior R
experience is
encouraged but not required.
R
R
/RStudio
InstructionsDuring the workshop, we will cover examples of how to fit
model-assisted survey estimators in the statistical software,
R
. You can optionally follow along by running these
examples in R
as we cover them. If you’d like to do this,
we recommend either installing R
(install page: https://cran.r-project.org/) and
RStudio
(install page: https://posit.co/download/rstudio-desktop/)
locally on your computer or setting up a free account on https://posit.cloud/, a cloud-based
RStudio
server. Then you will want to install the following
R
packages: mase
, recipes
,
survey
, tidyverse
.
Model-Assisted Survey Estimation with Modern Prediction Techniques by Jay Breidt and Jean Opsomer
Model Assisted Survey Sampling by Carl-Erik Särndal, Bengt Swensson, and Jan Wretman