Skip to contents

Introduction

This document reveals all the theory needed to understand methodology behind correction of missing not at random type of non-responses in survey samples. It covers methods such as:

  1. Generalized calibration

  2. Generalized calibration with more variables in calibration than response model

  3. Generalized method of moments (also known as GMM)

  4. Empirical likelihood estimation

  5. Non-parametric methods

  6. Exponential tilting

  7. Latent approach

Knowledge included in this particular paper allows one to understand correction techniques and should be treated by one as supplementary resource to our R programming language package called MNAR.

Interpretation of Missing Not at Random

In statistical surveys, missing data plays the most important role in the family of non-random errors. Their presence affects the process of estimating the unknown global values of a population by biasing given estimators and reducing their precision. The reason for such behavior lies in the characteristics of differences between respondents—participants who answered every question—and those participants who did not manage to provide answers to every question (item non-response) or did not answer at all (unit non-response), called non-respondents.

There happens to be a bunch of methods to deal with non-responses. However, the main idea behind the construction of them allows us to categorize them—we could split them into weighting methods and imputation methods. In most cases, one is able to determine which of the above groups of methods shall be used.

The imputation methods are used when dealing with item non-response—questions with a lack of answers are being corrected by, e.g., replenishing missing data. The weighting methods shall be used when dealing with unit non-responses in order to correct, by using a set of auxiliary variables, the weights of respondents and non-respondents in the sample such that known population totals are being reproduced. The choice of mentioned auxiliary variables matters and is strictly tied to the estimation process, thus it arises as the biggest problem in this category of methods. One might use a combination of both kinds of methods to eliminate the negative impact of non-responses.

Investigation of the described methodology starts with an explanation of the assumptions, settings, and notation behind sampling, responding, and estimation. Starting with the basic notation, let UU denote the population of size NN with a probability sample ss of length nn, sUs \subseteq U, nNn \leq N. According to the sampling design, let πk\pi_k denote first-order inclusion probability of kk-th element of population UU in sample ss. Thus, under given sampling design, dk=1πkd_k = \frac{1}{\pi_k} denotes an initial weight of kk-th element. Our main goal, when dealing with survey data, is to estimate the total of population UU, written as Y=k=1Nyk,\begin{equation}\label{eq:Pop. total} Y = \sum_{k=1}^{N}{y_k}, \end{equation} where yky_k is value of target variable YY for kk-th element, k=1,...,Nk = 1,...,N. Natural and usual choice here is to consider Horvitz-Thompson estimator of the form: ŶHT=ksdkyk.\begin{equation}\label{eq:HT estimator} \hat{Y}_{\text{HT}} = \sum_{k \in s}{d_k y_k}. \end{equation} By design, ŶHT\hat{Y}_{\text{HT}} is an unbiased estimator. If there happens to be non-respondents in the sample, then the summation is done over the subset rr of respondents, rsr \subseteq s.

Weighting methods

Usually, if non-responses occur, summation in () provides underestimated values compared to the population total from (). Thus, it is needed to perform correction of initial weights dkd_k under given sampling design - in other words, we have to perform of described dkd_k’s.

In general we have the following settings: - Information on the target variable YY is only available for respondents.

  • Information on auxiliary variables XX is available under the following settings:
    • Unit-level data is available for respondents and non-respondents.
    • Unit-level data is available only for respondents, but we have population totals for the reference population.

Case when dimensions of calibration and response-model variables coincide

Let 𝐱=(x1,x2,...,xp)T\boldsymbol{x}= (x_1, x_2, ..., x_p)^{\text{T}} denote benchmark vector of chosen auxiliary variables and 𝐱k=(x1k,x2k,...,xpk)T\boldsymbol{x}_{k} = (x_{1_k}, x_{2_k}, ..., x_{p_k})^{\text{T}} is the vector of auxiliary variables for kk-th element of the sample ss. Settings state that 𝐗\boldsymbol{X}, which is the vector of global auxiliary variables’ values is known, i.e. 𝐗=(k=1Nx1k,k=1Nx2k,...,k=1Nxpk)T=kU𝐱k.\begin{equation}\label{eq:Aux. Total} \boldsymbol{X}= \left(\sum_{k=1}^{N}{x_{1_k}}, \sum_{k=1}^{N}{x_{2_k}}, ..., \sum_{k=1}^{N}{x_{p_k}}\right)^{\text{T}} = \sum_{k \in U}{\boldsymbol{x}_k}. \end{equation} If any of auxiliary values total is not known, one might use xikx_{i_{k}} instead of yky_k’s into (), i.e. X̂HTi=k=1Nxik,i=1,...,p.\hat{X}^{i}_{\text{HT}} = \sum_{k=1}^{N}{x_{i_k}}, \; i= 1, ..., p. However, using 𝐱k\boldsymbol{x}_k instead of yky_k does not always work in process of estimation 𝐗\boldsymbol{X}. One needs to perform slightly different weights than dkd_k’s. Those weights, denoted as wkw_k’s as solutions to optimization problem of form argminwkkrGk(wk,dk),\begin{equation}\label{eq: optimization w_k} \operatorname{argmin}_{w_k}{\sum_{k\in r}G_k\left(w_k,d_k\right)}, \end{equation} where GkG_k is a strictly convex, differentiable function, for which Gk(dk,dk)=0G_k(d_k,d_k) = 0 and Gk(1)=Gk(1)=0G_k(1) = G'_k(1) = 0. Also, there exists a additional condition which has to be satisfied, namely: krwk𝐱k=kU𝐱k.\begin{equation}\label{eq: calib eq} \sum_{k\in r}{w_k\boldsymbol{x}_k} = \sum_{k\in U}{\boldsymbol{x}_k}. \end{equation} Equation () is also being called as . Using Lagrange multipliers method, it is shown in Deville and Särndal (1992), that vector of calibration weights might be written as: wk=dkFk(𝛌T𝐳k)\begin{equation}\label{eq: w_k minimizers form} w_k = d_k F_k (\boldsymbol{\lambda}^{\text{T}}\boldsymbol{z}_k) \end{equation} where 𝐳k\boldsymbol{z}_k is a vector of instrumental variables, coinciding, in sense of dimensions with 𝐱k\boldsymbol{x}_k. Later in this paper, we will consider situation where 𝐳k\boldsymbol{z}_k has got higher dimension than 𝐱k\boldsymbol{x}_k. FkF_k is the inverse of Gk(wk,dk)G_k'(w_k, d_k), defined as: Gk(wk,dk)=Gk(wk,dk)wk.\begin{equation}\label{eq: partial derivative of G_k} G_k'(w_k, d_k) = \frac{\partial{G_k(w_k, d_k)}}{\partial w_k}. \end{equation} There are various ideas to choose function GkG_k but it is a common case to consider GkG_k of form: Gk(wk,dk)=(wkdk)22dk\begin{equation}\label{eq: G_k example} G_k(w_k,d_k) = \frac{\left(w_k - d_k\right)^2}{2d_k} \end{equation} For such choice, the solution wkw_k of problem stated in () is expressed by Estevao and Särndal (n.d.) as: wk=dk(1+𝐳kT𝛌),\begin{equation}\label{eq:G_k optimizers} w_k = d_k(1 + \boldsymbol{z}_k^{\text{T}}\boldsymbol{\lambda}), \end{equation} where 𝐠\boldsymbol{g} is defined as follows: 𝐠=(krdk𝐱k𝐳kT)1×(𝐗krdk𝐱k).\begin{equation}\label{eq: g vector} \boldsymbol{g}= \left(\sum_{k \in r}{d_k\boldsymbol{x}_k\boldsymbol{z}_k^{\text{T}}}\right)^{-1} \times \left(\boldsymbol{X}- \sum_{k\in r}{d_k \boldsymbol{x}_k}\right). \end{equation} Using obtained wkw_k, known as the linear weights, a new, so called “calibration-weighted” estimator of target variable total from () is of the form: Ŷcal=krwkyk,\begin{equation}\label{eq: calibration-weighted estimator} \hat{Y}_{\text{cal}} = \sum_{k \in r}{w_k y_k}, \end{equation} which can be rendered as: Ŷcal=krdkyk+(𝐗krdk𝐱k)𝐛,\begin{equation}\label{eq: rendered calibration-weighted estimator} \hat{Y}_{\text{cal}} = \sum_{k \in r }{d_k y_k} + \left(\boldsymbol{X}- \sum_{k\in r}{d_k \boldsymbol{x}_k}\right)\boldsymbol{b}, \end{equation} where 𝐛=(krdk𝐳k𝐱kT)1×krdk𝐳kyk.\begin{equation*} \boldsymbol{b}= \left(\sum_{k \in r}{d_k\boldsymbol{z}_k\boldsymbol{x}_k^{\text{T}}}\right)^{-1} \times \sum_{k \in r}{d_k\boldsymbol{z}_k y_k}. \end{equation*} Notice, that Ŷcal\hat{Y}_{cal} is no longer unbiased by design. However it might be consistent, which is described in Isaki and Fuller (1982).

How does one formulate the prediction model in this case? Let’s denote two indicator random variables: Ij=1jURj=1kr.\begin{equation*} \displaystyle I_j = 1{j \in U} \;\;\; R_j = 1{k\in r}. \end{equation*}Kott and Chang (2010) proposed the double-protection justification set of equations: {yk=𝐱kT𝛃𝐱+ϵk𝐳k=𝐱kT𝚪+𝛈kT,\begin{equation}\label{eq: double-security-pred.framework} \left\{\begin{array}{lll} y_k &= \boldsymbol{x}_k^{\text{T}} \boldsymbol{\beta}_{\boldsymbol{x}} + \epsilon_k\\ \boldsymbol{z}_k &= \boldsymbol{x}_k^{\text{T}}\boldsymbol{\Gamma}+ \boldsymbol{\eta}_k^{\text{T}},\\ \end{array} \right. \end{equation} where 𝚪\boldsymbol{\Gamma} is usually on full-rank (not necessarily), 𝛃𝐱\boldsymbol{\beta}_{\boldsymbol{x}} is a coefficients vector and E(ϵk|𝐱j,Ij,Rj)=0,E(𝛈𝐤|𝐱j,Ij,Rj)=0.\begin{equation} E{\left(\epsilon_k|\boldsymbol{x}_j,I_j,R_j\right)} = 0, \;\; E{\left(\boldsymbol{\eta_k}|\boldsymbol{x}_j,I_j,R_j\right)} = 0. \end{equation} Under proposal from () there is a property in form of: (yk𝐳kT𝛃𝐳)|𝐱k=(ϵ𝛈kT𝚪1𝛃𝐱)|𝐱k,\begin{equation}\label{eq: property of 2-sec prediction} \left(y_k - \boldsymbol{z}_k^{\text{T}}\boldsymbol{\beta}_{\boldsymbol{z}}\right)|\boldsymbol{x}_k = \left(\epsilon - \boldsymbol{\eta}_k^{\text{T}}\boldsymbol{\Gamma}^{-1}\boldsymbol{\beta}_{\boldsymbol{x}}\right)|\boldsymbol{x}_k, \end{equation} where β𝐳=Γ1β𝐱.\begin{equation*} \beta_{\boldsymbol{z}} = \Gamma^{-1}\beta_{\boldsymbol{x}}. \end{equation*}

When there are more calibration than response-model variables

First, lets consider 𝐛𝐳*\boldsymbol{b}_{\boldsymbol{z}}^{*}, a asymptotic limit of: 𝐛𝐳=(kSdkRkFk(𝐱kT𝛌)𝐱k𝐳kT)1×kSdkRkFk(𝐱kT𝛌)𝐱kyk,\begin{equation}\label{eq: b_z} \boldsymbol{b}_{\boldsymbol{z}} = \left(\sum_{k \in S}{d_kR_kF'_k(\boldsymbol{x}_k^{\text{T}}\boldsymbol{\lambda})\boldsymbol{x}_k}\boldsymbol{z}_k^{\text{T}}\right)^{-1} \times \sum_{k \in S}{d_kR_kF'_k(\boldsymbol{x}_k^{\text{T}}\boldsymbol{\lambda})\boldsymbol{x}_ky_k}, \end{equation} which, alongside with 𝐛𝐳\boldsymbol{b}_{\boldsymbol{z}}, is said to exist apart from result of the prediction. When prediction model fails, we got 𝐛𝐳𝛃𝐳\boldsymbol{b}_{\boldsymbol{z}} - \boldsymbol{\beta}_{\boldsymbol{z}} as long as 𝛌\boldsymbol{\lambda} converged to a finite 𝛌*\boldsymbol{\lambda}^{*}. Chang and Kott (2008) considered this case and extended weighting approach by replacing the reformulated calibration equation from (): 𝐬=1N[kSdkRkFk(𝐱kT𝛌)𝐳k𝐱kkSdk𝐱k]=𝟎\begin{equation}\label{eq: reformulated calib. eq.} \boldsymbol{s} = \frac{1}{N} \left[\sum_{k \in S}{d_kR_kF_k(\boldsymbol{x}_k^{\text{T}}\boldsymbol{\lambda})\boldsymbol{z}_k}\boldsymbol{x}_k- \sum_{k \in S}{d_k\boldsymbol{x}_k}\right] = \boldsymbol{0} \end{equation} by finding 𝛌\boldsymbol{\lambda} that minimizes 𝐬T𝐐𝐬\boldsymbol{s}^{\text{T}}\boldsymbol{Q}\boldsymbol{s} for some symmetric and positive 𝐐\boldsymbol{Q}. There are various ways to pick 𝐐\boldsymbol{Q} as well as dealing with 𝚪\boldsymbol{\Gamma} not being on full rank. Couple of examples might be found in Kott and Liao (2017). For example, one of the options is to use 𝐐1=DIAG[(N1Sdk𝐱k)(N1Sdk𝐱k)]\displaystyle \mathbf{\boldsymbol{Q}}^{-1} = \text{DIAG}\left[\left({N}^{-1} \sum_{S} d_k \mathbf{\boldsymbol{x}}_k\right) \left({N}^{-1}\sum_{S} d_k \mathbf{\boldsymbol{x}}_k^\top \right)\right]. After finding 𝛌\boldsymbol{\lambda}, dimension of 𝐱kT\boldsymbol{x}_k^{\text{T}} is reduced in such way: 𝐱̃kT=N1𝐐jSdjRjFk(𝐳jT𝛌)𝐱j𝐳jT=𝐱kT(jSdjRjFk(𝐳jT𝛌)𝐱j𝐱jT)1jSdjRjFk(𝐳jT𝐠)𝐱j𝐳jT=𝐱kT𝐁𝐳.\begin{align*} \tilde{\boldsymbol{x}}_k^T &= N^{-1} \boldsymbol{Q}\sum_{j \in S} d_j R_j F_k' \left( \boldsymbol{z}_j^T \boldsymbol{\lambda}\right) \boldsymbol{x}_j \boldsymbol{z}_j^T\\ &= \boldsymbol{x}_k^T \left( \sum_{j \in S} d_j R_j F_k' \left( \boldsymbol{z}_j^T \boldsymbol{\lambda}\right) \boldsymbol{x}_j \boldsymbol{x}_j^T \right)^{-1} \sum_{j \in S} d_j R_j F_k' \left( \boldsymbol{z}_j^T \mathbf{g} \right) \boldsymbol{x}_j \boldsymbol{z}_j^T\\ &= \boldsymbol{x}_k^T \mathbf{B}_{\boldsymbol{z}}. \end{align*} Another approach to component reduction, proposed by Andridge and Little (2011), works without searching 𝛌\boldsymbol{\lambda} or does not even rely on picking 𝐐\boldsymbol{Q} matrix- idea relies on satisfying kSwk𝐱̃k=kSdkRkFk𝐱̃k=kSdk𝐱̃k\begin{equation}\label{eq: reformulated calib.eq} \sum_{k \in S} w_k \tilde{\boldsymbol{x}}_k = \sum_{k \in S} d_k R_k F_k \tilde{\boldsymbol{x}}_k = \sum_{k \in S} d_k \tilde{\boldsymbol{x}}_k \end{equation} and setting 𝐱̃kT=𝐱kT𝐀T,\begin{equation}\label{eq: setting A&L} \tilde{\boldsymbol{x}}_k^{\text{T}} = \boldsymbol{x}_k^{\text{T}}\boldsymbol{A}^{\text{T}}, \end{equation} where 𝐀T=(SRj𝐱j𝐱jT)1SRj𝐱j𝐳jT\boldsymbol{A}^{\text{T}} =\left(\sum_{S}{R_j\boldsymbol{x}_j\boldsymbol{x}_j^{\text{T}}}\right)^{-1} \sum_{S}{R_j\boldsymbol{x}_j\boldsymbol{z}_j^{\text{T}}}. Again, the reduction of dimensions is needed and with such obtained 𝐱̃k\tilde{\boldsymbol{x}}_k one is able to perform generalized calibration weighting technique. By far, this method is implemented as a part of MNAR:gencal() function.

References

Andridge, Rebecca R, and Roderick JA Little. 2011. “Proxy Pattern-Mixture Analysis for Survey Nonresponse.” Journal of Official Statistics 27 (2): 153.
Chang, T., and P. S. Kott. 2008. “Using Calibration Weighting to Adjust for Nonresponse Under a Plausible Model.” Biometrika 95 (3): 555–71. https://doi.org/10.1093/biomet/asn022.
Deville, Jean-Claude, and Carl-Erik Särndal. 1992. “Calibration Estimators in Survey Sampling.” Journal of the American Statistical Association 87 (418): 376–82.
Estevao, Victor, and Carl Särndal. n.d. “A Functional Form Approach to Calibration.” Journal of Of®cial Statistics 16 (4): 379±399. https://www.proquest.com/scholarly-journals/functional-form-approach-calibration/docview/1266846662/se-2.
Isaki, C. T., and W. A. Fuller. 1982. “Survey Design Under the Regression Super-Population Model.” Journal of the American Statistical Association 77 (377): 89–96.
Kott, Phillip S., and Ted Chang. 2010. “Using Calibration Weighting to Adjust for Nonignorable Unit Nonresponse.” Journal of the American Statistical Association 105 (491): 1265–75. https://doi.org/10.1198/jasa.2010.tm09016.
Kott, Phillip S., and Dan Liao. 2017. “Calibration Weighting for Nonresponse That Is Not Missing at Random: Allowing More Calibration Than Response-Model Variables.” Journal of Survey Statistics and Methodology 5 (2): 159–74. https://doi.org/10.1093/jssam/smx003.