Calibration weighting

Weighting methods

Usually, if non-responses occur, summation in () provides underestimated values compared to the population total from (). Thus, it is needed to perform correction of initial weights $d_k$ under given sampling design - in other words, we have to perform of described $d_k$ ’s.

In general we have the following settings: - Information on the target variable $Y$ is only available for respondents.

Information on auxiliary variables $X$ is available under the following settings:
- Unit-level data is available for respondents and non-respondents.
- Unit-level data is available only for respondents, but we have population totals for the reference population.

Case when dimensions of calibration and response-model variables coincide

Let $\boldsymbol{x}= (x_1, x_2, ..., x_p)^{\text{T}}$ denote benchmark vector of chosen auxiliary variables and $\boldsymbol{x}_{k} = (x_{1_k}, x_{2_k}, ..., x_{p_k})^{\text{T}}$ is the vector of auxiliary variables for $k$ -th element of the sample $s$ . Settings state that $\boldsymbol{X}$ , which is the vector of global auxiliary variables’ values is known, i.e. $\begin{equation}\label{eq:Aux. Total} \boldsymbol{X}= \left(\sum_{k=1}^{N}{x_{1_k}}, \sum_{k=1}^{N}{x_{2_k}}, ..., \sum_{k=1}^{N}{x_{p_k}}\right)^{\text{T}} = \sum_{k \in U}{\boldsymbol{x}_k}. \end{equation}$ If any of auxiliary values total is not known, one might use $x_{i_{k}}$ instead of $y_k$ ’s into (), i.e. $\hat{X}^{i}_{\text{HT}} = \sum_{k=1}^{N}{x_{i_k}}, \; i= 1, ..., p.$ However, using $\boldsymbol{x}_k$ instead of $y_k$ does not always work in process of estimation $\boldsymbol{X}$ . One needs to perform slightly different weights than $d_k$ ’s. Those weights, denoted as $w_k$ ’s as solutions to optimization problem of form $$\begin{equation}\label{eq: optimization w_k} \argmin_{w_k}{\sum_{k\in r}G_k\left(w_k,d_k\right)}, \end{equation}$$ where $G_k$ is a strictly convex, differentiable function, for which $G_k(d_k,d_k) = 0$ and $G_k(1) = G'_k(1) = 0$ . Also, there exists a additional condition which has to be satisfied, namely: $\begin{equation}\label{eq: calib eq} \sum_{k\in r}{w_k\boldsymbol{x}_k} = \sum_{k\in U}{\boldsymbol{x}_k}. \end{equation}$ Equation () is also being called as . Using Lagrange multipliers method, it is shown in Deville and Särndal (1992), that vector of calibration weights might be written as: $\begin{equation}\label{eq: w_k minimizers form} w_k = d_k F_k (\boldsymbol{\lambda}^{\text{T}}\boldsymbol{z}_k) \end{equation}$ where $\boldsymbol{z}_k$ is a vector of instrumental variables, coinciding, in sense of dimensions with $\boldsymbol{x}_k$ . Later in this paper, we will consider situation where $\boldsymbol{z}_k$ has got higher dimension than $\boldsymbol{x}_k$ . $F_k$ is the inverse of $G_k'(w_k, d_k)$ , defined as: $\begin{equation}\label{eq: partial derivative of G_k} G_k'(w_k, d_k) = \frac{\partial{G_k(w_k, d_k)}}{\partial w_k}. \end{equation}$ There are various ideas to choose function $G_k$ but it is a common case to consider $G_k$ of form: $\begin{equation}\label{eq: G_k example} G_k(w_k,d_k) = \frac{\left(w_k - d_k\right)^2}{2d_k} \end{equation}$ For such choice, the solution $w_k$ of problem stated in () is expressed by Estevao and Särndal (n.d.) as: $\begin{equation}\label{eq:G_k optimizers} w_k = d_k(1 + \boldsymbol{z}_k^{\text{T}}\boldsymbol{\lambda}), \end{equation}$ where $\boldsymbol{g}$ is defined as follows: $\begin{equation}\label{eq: g vector} \boldsymbol{g}= \left(\sum_{k \in r}{d_k\boldsymbol{x}_k\boldsymbol{z}_k^{\text{T}}}\right)^{-1} \times \left(\boldsymbol{X}- \sum_{k\in r}{d_k \boldsymbol{x}_k}\right). \end{equation}$ Using obtained $w_k$ , known as the linear weights, a new, so called “calibration-weighted” estimator of target variable total from () is of the form: $\begin{equation}\label{eq: calibration-weighted estimator} \hat{Y}_{\text{cal}} = \sum_{k \in r}{w_k y_k}, \end{equation}$ which can be rendered as: $\begin{equation}\label{eq: rendered calibration-weighted estimator} \hat{Y}_{\text{cal}} = \sum_{k \in r }{d_k y_k} + \left(\boldsymbol{X}- \sum_{k\in r}{d_k \boldsymbol{x}_k}\right)\boldsymbol{b}, \end{equation}$ where $\begin{equation*} \boldsymbol{b}= \left(\sum_{k \in r}{d_k\boldsymbol{z}_k\boldsymbol{x}_k^{\text{T}}}\right)^{-1} \times \sum_{k \in r}{d_k\boldsymbol{z}_k y_k}. \end{equation*}$ Notice, that $\hat{Y}_{cal}$ is no longer unbiased by design. However it might be consistent, which is described in Isaki and Fuller (1982).

How does one formulate the prediction model in this case? Let’s denote two indicator random variables: $\begin{equation*} \displaystyle I_j = 1{j \in U} \;\;\; R_j = 1{k\in r}. \end{equation*}$ Kott and Chang (2010) proposed the double-protection justiﬁcation set of equations: $\begin{equation}\label{eq: double-security-pred.framework} \left\{\begin{array}{lll} y_k &= \boldsymbol{x}_k^{\text{T}} \boldsymbol{\beta}_{\boldsymbol{x}} + \epsilon_k\\ \boldsymbol{z}_k &= \boldsymbol{x}_k^{\text{T}}\boldsymbol{\Gamma}+ \boldsymbol{\eta}_k^{\text{T}},\\ \end{array} \right. \end{equation}$ where $\boldsymbol{\Gamma}$ is usually on full-rank (not necessarily), $\boldsymbol{\beta}_{\boldsymbol{x}}$ is a coefficients vector and $\begin{equation} E{\left(\epsilon_k|\boldsymbol{x}_j,I_j,R_j\right)} = 0, \;\; E{\left(\boldsymbol{\eta_k}|\boldsymbol{x}_j,I_j,R_j\right)} = 0. \end{equation}$ Under proposal from () there is a property in form of: $\begin{equation}\label{eq: property of 2-sec prediction} \left(y_k - \boldsymbol{z}_k^{\text{T}}\boldsymbol{\beta}_{\boldsymbol{z}}\right)|\boldsymbol{x}_k = \left(\epsilon - \boldsymbol{\eta}_k^{\text{T}}\boldsymbol{\Gamma}^{-1}\boldsymbol{\beta}_{\boldsymbol{x}}\right)|\boldsymbol{x}_k, \end{equation}$ where $\begin{equation*} \beta_{\boldsymbol{z}} = \Gamma^{-1}\beta_{\boldsymbol{x}}. \end{equation*}$

When there are more calibration than response-model variables

First, lets consider $\boldsymbol{b}_{\boldsymbol{z}}^{*}$ , a asymptotic limit of: $\begin{equation}\label{eq: b_z} \boldsymbol{b}_{\boldsymbol{z}} = \left(\sum_{k \in S}{d_kR_kF'_k(\boldsymbol{x}_k^{\text{T}}\boldsymbol{\lambda})\boldsymbol{x}_k}\boldsymbol{z}_k^{\text{T}}\right)^{-1} \times \sum_{k \in S}{d_kR_kF'_k(\boldsymbol{x}_k^{\text{T}}\boldsymbol{\lambda})\boldsymbol{x}_ky_k}, \end{equation}$ which, alongside with $\boldsymbol{b}_{\boldsymbol{z}}$ , is said to exist apart from result of the prediction. When prediction model fails, we got $\boldsymbol{b}_{\boldsymbol{z}} - \boldsymbol{\beta}_{\boldsymbol{z}}$ as long as $\boldsymbol{\lambda}$ converged to a finite $\boldsymbol{\lambda}^{*}$ . Chang and Kott (2008) considered this case and extended weighting approach by replacing the reformulated calibration equation from (): $\begin{equation}\label{eq: reformulated calib. eq.} \boldsymbol{s} = \frac{1}{N} \left[\sum_{k \in S}{d_kR_kF_k(\boldsymbol{x}_k^{\text{T}}\boldsymbol{\lambda})\boldsymbol{z}_k}\boldsymbol{x}_k- \sum_{k \in S}{d_k\boldsymbol{x}_k}\right] = \boldsymbol{0} \end{equation}$ by finding $\boldsymbol{\lambda}$ that minimizes $\boldsymbol{s}^{\text{T}}\boldsymbol{Q}\boldsymbol{s}$ for some symmetric and positive $\boldsymbol{Q}$ . There are various ways to pick $\boldsymbol{Q}$ as well as dealing with $\boldsymbol{\Gamma}$ not being on full rank. Couple of examples might be found in Kott and Liao (2017). For example, one of the options is to use $\displaystyle \mathbf{\boldsymbol{Q}}^{-1} = \text{DIAG}\left[\left({N}^{-1} \sum_{S} d_k \mathbf{\boldsymbol{x}}_k\right) \left({N}^{-1}\sum_{S} d_k \mathbf{\boldsymbol{x}}_k^\top \right)\right]$ . After finding $\boldsymbol{\lambda}$ , dimension of $\boldsymbol{x}_k^{\text{T}}$ is reduced in such way: $\begin{align*} \tilde{\boldsymbol{x}}_k^T &= N^{-1} \boldsymbol{Q}\sum_{j \in S} d_j R_j F_k' \left( \boldsymbol{z}_j^T \boldsymbol{\lambda}\right) \boldsymbol{x}_j \boldsymbol{z}_j^T\\ &= \boldsymbol{x}_k^T \left( \sum_{j \in S} d_j R_j F_k' \left( \boldsymbol{z}_j^T \boldsymbol{\lambda}\right) \boldsymbol{x}_j \boldsymbol{x}_j^T \right)^{-1} \sum_{j \in S} d_j R_j F_k' \left( \boldsymbol{z}_j^T \mathbf{g} \right) \boldsymbol{x}_j \boldsymbol{z}_j^T\\ &= \boldsymbol{x}_k^T \mathbf{B}_{\boldsymbol{z}}. \end{align*}$ Another approach to component reduction, proposed by Andridge and Little (2011), works without searching $\boldsymbol{\lambda}$ or does not even rely on picking $\boldsymbol{Q}$ matrix- idea relies on satisfying $\begin{equation}\label{eq: reformulated calib.eq} \sum_{k \in S} w_k \tilde{\boldsymbol{x}}_k = \sum_{k \in S} d_k R_k F_k \tilde{\boldsymbol{x}}_k = \sum_{k \in S} d_k \tilde{\boldsymbol{x}}_k \end{equation}$ and setting $\begin{equation}\label{eq: setting A&L} \tilde{\boldsymbol{x}}_k^{\text{T}} = \boldsymbol{x}_k^{\text{T}}\boldsymbol{A}^{\text{T}}, \end{equation}$ where $\boldsymbol{A}^{\text{T}} =\left(\sum_{S}{R_j\boldsymbol{x}_j\boldsymbol{x}_j^{\text{T}}}\right)^{-1} \sum_{S}{R_j\boldsymbol{x}_j\boldsymbol{z}_j^{\text{T}}}$ . Again, the reduction of dimensions is needed and with such obtained $\tilde{\boldsymbol{x}}_k$ one is able to perform generalized calibration weighting technique. By far, this method is implemented as a part of MNAR:gencal() function.

References

Andridge, Rebecca R, and Roderick JA Little. 2011. “Proxy Pattern-Mixture Analysis for Survey Nonresponse.” Journal of Official Statistics 27 (2): 153.

Chang, T., and P. S. Kott. 2008. “Using Calibration Weighting to Adjust for Nonresponse Under a Plausible Model.” Biometrika 95 (3): 555–71. https://doi.org/10.1093/biomet/asn022.

Deville, Jean-Claude, and Carl-Erik Särndal. 1992. “Calibration Estimators in Survey Sampling.” Journal of the American Statistical Association 87 (418): 376–82.

Estevao, Victor, and Carl Särndal. n.d. “A Functional Form Approach to Calibration.” Journal of Of®cial Statistics 16 (4): 379±399. https://www.proquest.com/scholarly-journals/functional-form-approach-calibration/docview/1266846662/se-2.

Isaki, C. T., and W. A. Fuller. 1982. “Survey Design Under the Regression Super-Population Model.” Journal of the American Statistical Association 77 (377): 89–96.

Kott, Phillip S., and Ted Chang. 2010. “Using Calibration Weighting to Adjust for Nonignorable Unit Nonresponse.” Journal of the American Statistical Association 105 (491): 1265–75. https://doi.org/10.1198/jasa.2010.tm09016.

Kott, Phillip S., and Dan Liao. 2017. “Calibration Weighting for Nonresponse That Is Not Missing at Random: Allowing More Calibration Than Response-Model Variables.” Journal of Survey Statistics and Methodology 5 (2): 159–74. https://doi.org/10.1093/jssam/smx003.

Maciej Ostapiuk and Maciej Beręsewicz

Weighting methods

Case when dimensions of calibration and response-model variables coincide

When there are more calibration than response-model variables

References