Usually, if non-responses occur, summation in () provides
underestimated values compared to the population total from (). Thus, it
is needed to perform correction of initial weights
under given sampling design - in other words, we have to perform of
described
โs.
Case when dimensions of calibration and response-model variables
coincide
Let
denote benchmark vector of chosen auxiliary variables and
is the vector of auxiliary variables for
-th
element of the sample
.
Settings state that
,
which is the vector of global auxiliary variablesโ values is known, i.e.
If any of auxiliary
values total is not known, one might use
instead of
โs
into (),
i.e.ย
However, using
instead of
does not always work in process of estimation
.
One needs to perform slightly different weights than
โs.
Those weights, denoted as
โs
as solutions to optimization problem of form $$\begin{equation}\label{eq: optimization w_k}
\argmin_{w_k}{\sum_{k\in r}G_k\left(w_k,d_k\right)},
\end{equation}$$ where
is a strictly convex, differentiable function, for which
and
.
Also, there exists a additional condition which has to be satisfied,
namely:
Equation () is also being
called as . Using Lagrange multipliers method, it is shown in Deville and Sรคrndal (1992), that vector of
calibration weights might be written as:
where
is a vector of instrumental variables, coinciding, in sense of
dimensions with
.
Later in this paper, we will consider situation where
has got higher dimension than
.
is the inverse of
,
defined as:
There are various ideas
to choose function
but it is a common case to consider
of form:
For such choice, the
solution
of problem stated in () is expressed by Estevao
and Sรคrndal (n.d.) as:
where
is defined as follows:
Using obtained
,
known as the linear weights, a new, so called โcalibration-weightedโ
estimator of target variable total from () is of the form:
which can be rendered as:
where
Notice, that
is no longer unbiased by design. However it might be consistent, which
is described in Isaki and Fuller (1982).
How does one formulate the prediction model in this case? Letโs
denote two indicator random variables:
Kott and Chang (2010) proposed the
double-protection justi๏ฌcation set of equations:
where
is usually on full-rank (not necessarily),
is a coefficients vector and
Under proposal from ()
there is a property in form of:
where
When there are more calibration than response-model variables
First, lets consider
,
a asymptotic limit of:
which, alongside with
,
is said to exist apart from result of the prediction. When prediction
model fails, we got
as long as
converged to a finite
.
Chang and Kott (2008) considered this case and
extended weighting approach by replacing the reformulated calibration
equation from ():
by finding
that minimizes
for some symmetric and positive
.
There are various ways to pick
as well as dealing with
not being on full rank. Couple of examples might be found in Kott and Liao (2017). For example,
one of the options is to use
.
After finding
,
dimension of
is reduced in such way:
Another approach to
component reduction, proposed by Andridge and
Little (2011), works without
searching
or does not even rely on picking
matrix- idea relies on satisfying
and setting
where
.
Again, the reduction of dimensions is needed and with such obtained
one is able to perform generalized calibration weighting technique. By
far, this method is implemented as a part of MNAR:gencal()
function.