Skip to contents

Weighting methods

Usually, if non-responses occur, summation in () provides underestimated values compared to the population total from (). Thus, it is needed to perform correction of initial weights dkd_k under given sampling design - in other words, we have to perform of described dkd_kโ€™s.

In general we have the following settings: - Information on the target variable YY is only available for respondents.

  • Information on auxiliary variables XX is available under the following settings:
    • Unit-level data is available for respondents and non-respondents.
    • Unit-level data is available only for respondents, but we have population totals for the reference population.

Case when dimensions of calibration and response-model variables coincide

Let ๐ฑ=(x1,x2,...,xp)T\boldsymbol{x}= (x_1, x_2, ..., x_p)^{\text{T}} denote benchmark vector of chosen auxiliary variables and ๐ฑk=(x1k,x2k,...,xpk)T\boldsymbol{x}_{k} = (x_{1_k}, x_{2_k}, ..., x_{p_k})^{\text{T}} is the vector of auxiliary variables for kk-th element of the sample ss. Settings state that ๐—\boldsymbol{X}, which is the vector of global auxiliary variablesโ€™ values is known, i.e. ๐—=(โˆ‘k=1Nx1k,โˆ‘k=1Nx2k,...,โˆ‘k=1Nxpk)T=โˆ‘kโˆˆU๐ฑk.\begin{equation}\label{eq:Aux. Total} \boldsymbol{X}= \left(\sum_{k=1}^{N}{x_{1_k}}, \sum_{k=1}^{N}{x_{2_k}}, ..., \sum_{k=1}^{N}{x_{p_k}}\right)^{\text{T}} = \sum_{k \in U}{\boldsymbol{x}_k}. \end{equation} If any of auxiliary values total is not known, one might use xikx_{i_{k}} instead of yky_kโ€™s into (), i.e.ย Xฬ‚HTi=โˆ‘k=1Nxik,i=1,...,p.\hat{X}^{i}_{\text{HT}} = \sum_{k=1}^{N}{x_{i_k}}, \; i= 1, ..., p. However, using ๐ฑk\boldsymbol{x}_k instead of yky_k does not always work in process of estimation ๐—\boldsymbol{X}. One needs to perform slightly different weights than dkd_kโ€™s. Those weights, denoted as wkw_kโ€™s as solutions to optimization problem of form $$\begin{equation}\label{eq: optimization w_k} \argmin_{w_k}{\sum_{k\in r}G_k\left(w_k,d_k\right)}, \end{equation}$$ where GkG_k is a strictly convex, differentiable function, for which Gk(dk,dk)=0G_k(d_k,d_k) = 0 and Gk(1)=Gโ€ฒk(1)=0G_k(1) = G'_k(1) = 0. Also, there exists a additional condition which has to be satisfied, namely: โˆ‘kโˆˆrwk๐ฑk=โˆ‘kโˆˆU๐ฑk.\begin{equation}\label{eq: calib eq} \sum_{k\in r}{w_k\boldsymbol{x}_k} = \sum_{k\in U}{\boldsymbol{x}_k}. \end{equation} Equation () is also being called as . Using Lagrange multipliers method, it is shown in Deville and Sรคrndal (1992), that vector of calibration weights might be written as: wk=dkFk(๐›ŒT๐ณk)\begin{equation}\label{eq: w_k minimizers form} w_k = d_k F_k (\boldsymbol{\lambda}^{\text{T}}\boldsymbol{z}_k) \end{equation} where ๐ณk\boldsymbol{z}_k is a vector of instrumental variables, coinciding, in sense of dimensions with ๐ฑk\boldsymbol{x}_k. Later in this paper, we will consider situation where ๐ณk\boldsymbol{z}_k has got higher dimension than ๐ฑk\boldsymbol{x}_k. FkF_k is the inverse of Gkโ€ฒ(wk,dk)G_k'(w_k, d_k), defined as: Gkโ€ฒ(wk,dk)=โˆ‚Gk(wk,dk)โˆ‚wk.\begin{equation}\label{eq: partial derivative of G_k} G_k'(w_k, d_k) = \frac{\partial{G_k(w_k, d_k)}}{\partial w_k}. \end{equation} There are various ideas to choose function GkG_k but it is a common case to consider GkG_k of form: Gk(wk,dk)=(wkโˆ’dk)22dk\begin{equation}\label{eq: G_k example} G_k(w_k,d_k) = \frac{\left(w_k - d_k\right)^2}{2d_k} \end{equation} For such choice, the solution wkw_k of problem stated in () is expressed by Estevao and Sรคrndal (n.d.) as: wk=dk(1+๐ณkT๐›Œ),\begin{equation}\label{eq:G_k optimizers} w_k = d_k(1 + \boldsymbol{z}_k^{\text{T}}\boldsymbol{\lambda}), \end{equation} where ๐ \boldsymbol{g} is defined as follows: ๐ =(โˆ‘kโˆˆrdk๐ฑk๐ณkT)โˆ’1ร—(๐—โˆ’โˆ‘kโˆˆrdk๐ฑk).\begin{equation}\label{eq: g vector} \boldsymbol{g}= \left(\sum_{k \in r}{d_k\boldsymbol{x}_k\boldsymbol{z}_k^{\text{T}}}\right)^{-1} \times \left(\boldsymbol{X}- \sum_{k\in r}{d_k \boldsymbol{x}_k}\right). \end{equation} Using obtained wkw_k, known as the linear weights, a new, so called โ€œcalibration-weightedโ€ estimator of target variable total from () is of the form: Yฬ‚cal=โˆ‘kโˆˆrwkyk,\begin{equation}\label{eq: calibration-weighted estimator} \hat{Y}_{\text{cal}} = \sum_{k \in r}{w_k y_k}, \end{equation} which can be rendered as: Yฬ‚cal=โˆ‘kโˆˆrdkyk+(๐—โˆ’โˆ‘kโˆˆrdk๐ฑk)๐›,\begin{equation}\label{eq: rendered calibration-weighted estimator} \hat{Y}_{\text{cal}} = \sum_{k \in r }{d_k y_k} + \left(\boldsymbol{X}- \sum_{k\in r}{d_k \boldsymbol{x}_k}\right)\boldsymbol{b}, \end{equation} where ๐›=(โˆ‘kโˆˆrdk๐ณk๐ฑkT)โˆ’1ร—โˆ‘kโˆˆrdk๐ณkyk.\begin{equation*} \boldsymbol{b}= \left(\sum_{k \in r}{d_k\boldsymbol{z}_k\boldsymbol{x}_k^{\text{T}}}\right)^{-1} \times \sum_{k \in r}{d_k\boldsymbol{z}_k y_k}. \end{equation*} Notice, that Yฬ‚cal\hat{Y}_{cal} is no longer unbiased by design. However it might be consistent, which is described in Isaki and Fuller (1982).

How does one formulate the prediction model in this case? Letโ€™s denote two indicator random variables: Ij=1jโˆˆURj=1kโˆˆr.\begin{equation*} \displaystyle I_j = 1{j \in U} \;\;\; R_j = 1{k\in r}. \end{equation*}Kott and Chang (2010) proposed the double-protection justi๏ฌcation set of equations: {yk=๐ฑkT๐›ƒ๐ฑ+ฯตk๐ณk=๐ฑkT๐šช+๐›ˆkT,\begin{equation}\label{eq: double-security-pred.framework} \left\{\begin{array}{lll} y_k &= \boldsymbol{x}_k^{\text{T}} \boldsymbol{\beta}_{\boldsymbol{x}} + \epsilon_k\\ \boldsymbol{z}_k &= \boldsymbol{x}_k^{\text{T}}\boldsymbol{\Gamma}+ \boldsymbol{\eta}_k^{\text{T}},\\ \end{array} \right. \end{equation} where ๐šช\boldsymbol{\Gamma} is usually on full-rank (not necessarily), ๐›ƒ๐ฑ\boldsymbol{\beta}_{\boldsymbol{x}} is a coefficients vector and E(ฯตk|๐ฑj,Ij,Rj)=0,E(๐›ˆ๐ค|๐ฑj,Ij,Rj)=0.\begin{equation} E{\left(\epsilon_k|\boldsymbol{x}_j,I_j,R_j\right)} = 0, \;\; E{\left(\boldsymbol{\eta_k}|\boldsymbol{x}_j,I_j,R_j\right)} = 0. \end{equation} Under proposal from () there is a property in form of: (ykโˆ’๐ณkT๐›ƒ๐ณ)|๐ฑk=(ฯตโˆ’๐›ˆkT๐šชโˆ’1๐›ƒ๐ฑ)|๐ฑk,\begin{equation}\label{eq: property of 2-sec prediction} \left(y_k - \boldsymbol{z}_k^{\text{T}}\boldsymbol{\beta}_{\boldsymbol{z}}\right)|\boldsymbol{x}_k = \left(\epsilon - \boldsymbol{\eta}_k^{\text{T}}\boldsymbol{\Gamma}^{-1}\boldsymbol{\beta}_{\boldsymbol{x}}\right)|\boldsymbol{x}_k, \end{equation} where ฮฒ๐ณ=ฮ“โˆ’1ฮฒ๐ฑ.\begin{equation*} \beta_{\boldsymbol{z}} = \Gamma^{-1}\beta_{\boldsymbol{x}}. \end{equation*}

When there are more calibration than response-model variables

First, lets consider ๐›๐ณ*\boldsymbol{b}_{\boldsymbol{z}}^{*}, a asymptotic limit of: ๐›๐ณ=(โˆ‘kโˆˆSdkRkFโ€ฒk(๐ฑkT๐›Œ)๐ฑk๐ณkT)โˆ’1ร—โˆ‘kโˆˆSdkRkFโ€ฒk(๐ฑkT๐›Œ)๐ฑkyk,\begin{equation}\label{eq: b_z} \boldsymbol{b}_{\boldsymbol{z}} = \left(\sum_{k \in S}{d_kR_kF'_k(\boldsymbol{x}_k^{\text{T}}\boldsymbol{\lambda})\boldsymbol{x}_k}\boldsymbol{z}_k^{\text{T}}\right)^{-1} \times \sum_{k \in S}{d_kR_kF'_k(\boldsymbol{x}_k^{\text{T}}\boldsymbol{\lambda})\boldsymbol{x}_ky_k}, \end{equation} which, alongside with ๐›๐ณ\boldsymbol{b}_{\boldsymbol{z}}, is said to exist apart from result of the prediction. When prediction model fails, we got ๐›๐ณโˆ’๐›ƒ๐ณ\boldsymbol{b}_{\boldsymbol{z}} - \boldsymbol{\beta}_{\boldsymbol{z}} as long as ๐›Œ\boldsymbol{\lambda} converged to a finite ๐›Œ*\boldsymbol{\lambda}^{*}. Chang and Kott (2008) considered this case and extended weighting approach by replacing the reformulated calibration equation from (): ๐ฌ=1N[โˆ‘kโˆˆSdkRkFk(๐ฑkT๐›Œ)๐ณk๐ฑkโˆ’โˆ‘kโˆˆSdk๐ฑk]=๐ŸŽ\begin{equation}\label{eq: reformulated calib. eq.} \boldsymbol{s} = \frac{1}{N} \left[\sum_{k \in S}{d_kR_kF_k(\boldsymbol{x}_k^{\text{T}}\boldsymbol{\lambda})\boldsymbol{z}_k}\boldsymbol{x}_k- \sum_{k \in S}{d_k\boldsymbol{x}_k}\right] = \boldsymbol{0} \end{equation} by finding ๐›Œ\boldsymbol{\lambda} that minimizes ๐ฌT๐๐ฌ\boldsymbol{s}^{\text{T}}\boldsymbol{Q}\boldsymbol{s} for some symmetric and positive ๐\boldsymbol{Q}. There are various ways to pick ๐\boldsymbol{Q} as well as dealing with ๐šช\boldsymbol{\Gamma} not being on full rank. Couple of examples might be found in Kott and Liao (2017). For example, one of the options is to use ๐โˆ’1=DIAG[(Nโˆ’1โˆ‘Sdk๐ฑk)(Nโˆ’1โˆ‘Sdk๐ฑkโŠค)]\displaystyle \mathbf{\boldsymbol{Q}}^{-1} = \text{DIAG}\left[\left({N}^{-1} \sum_{S} d_k \mathbf{\boldsymbol{x}}_k\right) \left({N}^{-1}\sum_{S} d_k \mathbf{\boldsymbol{x}}_k^\top \right)\right]. After finding ๐›Œ\boldsymbol{\lambda}, dimension of ๐ฑkT\boldsymbol{x}_k^{\text{T}} is reduced in such way: ๐ฑฬƒkT=Nโˆ’1๐โˆ‘jโˆˆSdjRjFkโ€ฒ(๐ณjT๐›Œ)๐ฑj๐ณjT=๐ฑkT(โˆ‘jโˆˆSdjRjFkโ€ฒ(๐ณjT๐›Œ)๐ฑj๐ฑjT)โˆ’1โˆ‘jโˆˆSdjRjFkโ€ฒ(๐ณjT๐ )๐ฑj๐ณjT=๐ฑkT๐๐ณ.\begin{align*} \tilde{\boldsymbol{x}}_k^T &= N^{-1} \boldsymbol{Q}\sum_{j \in S} d_j R_j F_k' \left( \boldsymbol{z}_j^T \boldsymbol{\lambda}\right) \boldsymbol{x}_j \boldsymbol{z}_j^T\\ &= \boldsymbol{x}_k^T \left( \sum_{j \in S} d_j R_j F_k' \left( \boldsymbol{z}_j^T \boldsymbol{\lambda}\right) \boldsymbol{x}_j \boldsymbol{x}_j^T \right)^{-1} \sum_{j \in S} d_j R_j F_k' \left( \boldsymbol{z}_j^T \mathbf{g} \right) \boldsymbol{x}_j \boldsymbol{z}_j^T\\ &= \boldsymbol{x}_k^T \mathbf{B}_{\boldsymbol{z}}. \end{align*} Another approach to component reduction, proposed by Andridge and Little (2011), works without searching ๐›Œ\boldsymbol{\lambda} or does not even rely on picking ๐\boldsymbol{Q} matrix- idea relies on satisfying โˆ‘kโˆˆSwk๐ฑฬƒk=โˆ‘kโˆˆSdkRkFk๐ฑฬƒk=โˆ‘kโˆˆSdk๐ฑฬƒk\begin{equation}\label{eq: reformulated calib.eq} \sum_{k \in S} w_k \tilde{\boldsymbol{x}}_k = \sum_{k \in S} d_k R_k F_k \tilde{\boldsymbol{x}}_k = \sum_{k \in S} d_k \tilde{\boldsymbol{x}}_k \end{equation} and setting ๐ฑฬƒkT=๐ฑkT๐€T,\begin{equation}\label{eq: setting A&L} \tilde{\boldsymbol{x}}_k^{\text{T}} = \boldsymbol{x}_k^{\text{T}}\boldsymbol{A}^{\text{T}}, \end{equation} where ๐€T=(โˆ‘SRj๐ฑj๐ฑjT)โˆ’1โˆ‘SRj๐ฑj๐ณjT\boldsymbol{A}^{\text{T}} =\left(\sum_{S}{R_j\boldsymbol{x}_j\boldsymbol{x}_j^{\text{T}}}\right)^{-1} \sum_{S}{R_j\boldsymbol{x}_j\boldsymbol{z}_j^{\text{T}}}. Again, the reduction of dimensions is needed and with such obtained ๐ฑฬƒk\tilde{\boldsymbol{x}}_k one is able to perform generalized calibration weighting technique. By far, this method is implemented as a part of MNAR:gencal() function.

References

Andridge, Rebecca R, and Roderick JA Little. 2011. โ€œProxy Pattern-Mixture Analysis for Survey Nonresponse.โ€ Journal of Official Statistics 27 (2): 153.
Chang, T., and P. S. Kott. 2008. โ€œUsing Calibration Weighting to Adjust for Nonresponse Under a Plausible Model.โ€ Biometrika 95 (3): 555โ€“71. https://doi.org/10.1093/biomet/asn022.
Deville, Jean-Claude, and Carl-Erik Sรคrndal. 1992. โ€œCalibration Estimators in Survey Sampling.โ€ Journal of the American Statistical Association 87 (418): 376โ€“82.
Estevao, Victor, and Carl Sรคrndal. n.d. โ€œA Functional Form Approach to Calibration.โ€ Journal of Ofยฎcial Statistics 16 (4): 379ยฑ399. https://www.proquest.com/scholarly-journals/functional-form-approach-calibration/docview/1266846662/se-2.
Isaki, C. T., and W. A. Fuller. 1982. โ€œSurvey Design Under the Regression Super-Population Model.โ€ Journal of the American Statistical Association 77 (377): 89โ€“96.
Kott, Phillip S., and Ted Chang. 2010. โ€œUsing Calibration Weighting to Adjust for Nonignorable Unit Nonresponse.โ€ Journal of the American Statistical Association 105 (491): 1265โ€“75. https://doi.org/10.1198/jasa.2010.tm09016.
Kott, Phillip S., and Dan Liao. 2017. โ€œCalibration Weighting for Nonresponse That Is Not Missing at Random: Allowing More Calibration Than Response-Model Variables.โ€ Journal of Survey Statistics and Methodology 5 (2): 159โ€“74. https://doi.org/10.1093/jssam/smx003.