» Home » Resources & support » FAQs » Missing standard error because of stratum with single sampling unit

Title | Missing standard error because of stratum with single sampling unit | |

Author | Dan (Mia) Lv, StataCorp |

By default, Stata's survey estimation commands report missing standard errors when they encounter a stratum with a singleton PSU. Here is an example:

.use http://www.stata-press.com/data/r15/nhanes2b, clear.svyset psuid [pweight=finalwgt], strata(stratid)(output omitted) .svy: mean hdresult(running mean on estimation sample) Survey: Mean estimation Number of strata = 31 Number of obs = 8,720 Number of PSUs = 60 Population size = 98,725,345 Design df = 29

Linearized | ||

Mean std. err. [95% conf. interval] | ||

hdresult | 49.67141 . . . | |

When there is only one PSU within a stratum, there is insufficient information to compute an estimate of that stratum's variance. Therefore, it is impossible to compute the variance of an estimated parameter when the data are from a stratified clustered design. There are two different solutions. The first solution is to reassign each stratum with a singleton PSU to another appropriately chosen stratum. To use this method, we must identify the strata with singleton PSUs first.

After setting our survey characteristics with **svyset**, we can use the **svydescribe**
command to identify the strata with singleton PSUs. Those strata will be marked
with an asterisk in the output. Let's look at the following dataset:

clear input stratid psuid age hdresult finalwgt 1 1 68 40 9687 1 1 54 53 36028 2 1 26 35 26896 2 1 24 48 8213 2 2 68 43 3316 2 2 61 65 8475 3 1 25 80 10900 3 1 27 93 7619 3 2 24 38 22584 3 2 64 72 2875 end svyset psuid [pweight=finalwgt], strata(stratid) save data1.dta

We run **svydescribe** and get the following output:

.svydescribeSurvey: Describing stage 1 sampling units Sampling weights: finalwgt VCE: linearized Single unit: missing Strata 1: stratid Sampling unit 1: psuid FPC 1: <zero>

Number of obs per unit Stratum # units # obs Min Mean Max | ||

1 1* 2 2 2.0 2 2 2 4 2 2.0 2 3 2 4 2 2.0 2 | ||

3 5 10 2 2.0 2 |

Here we can see that the stratum 1 has a singleton PSU.

We perform an estimation with survey data, the problem of stratum with a singleton PSU can arise, even if all strata in the dataset have multiple PSUs. This happens when some observations are dropped because of missing values.

Let us look at the following survey data. In this example, when we try
to estimate the mean of variable **hdresult**, the standard errors
are missing, and a note on the output tells us that this is caused by a
stratum with a single PSU:

clear input stratid psuid age hdresult finalwgt 1 1 68 40 9687 1 1 54 53 36028 1 2 28 . 9356 1 2 35 . 10265 2 1 26 35 26896 2 1 24 48 8213 2 2 68 43 3316 2 2 61 65 8475 3 1 25 80 10900 3 1 27 93 7619 3 2 24 38 22584 3 2 64 72 2875 end svyset psuid [pweight=finalwgt], strata(stratid)

.svy: mean hdresult(running mean on estimation sample) Survey: Mean estimation Number of strata = 3 Number of obs = 10 Number of PSUs = 5 Population size = 136,593 Design df = 2

Linearized | ||

Mean std. err. [95% conf. interval] | ||

hdresult | 51.04046 . . . | |

Number of obs per unit Stratum # units # obs Min Mean Max | ||

1 2 4 2 2.0 2 2 2 4 2 2.0 2 3 2 4 2 2.0 2 | ||

3 6 12 2 2.0 2 |

The command **svydescribe** does not detect any stratum with singleton PSUs
because by default **svydescribe** checks the entire dataset. However, the
appropriate way here is to use the **if e(sample)** expression to run
**svydescribe** within the estimation sample used by **svy: mean hdresult**.

.svydescribe if e(sample)Survey: Describing stage 1 sampling units pweight: finalwgt VCE: linearized Single unit: missing Strata 1: stratid SU 1: psuid FPC 1: <zero>

Number of obs per unit |

Stratum # units # obs Min Mean Max |

1 1* 2 2 2.0 2 2 2 4 2 2.0 2 3 2 4 2 2.0 2 |

3 5 10 2 2.0 2 |

2 = #Obs with missing values in the |

survey characteristics |

12 |

An alternative way to use **svydescribe** in this scenario is to write:

svydescribe hdresult

This line will apply **svydescribe** to the subset of the data where
variable **hdresult** doesn't have missing values.

After detecting the strata with singleton PSUs, we now reassign each
stratum with a singleton PSU to another properly chosen stratum. Let
us look at the dataset **data1.dta**, saved in the previous section. We
already know that only the stratum 1 has a singleton PSU. Assuming that
we want to reassign stratum 1 to stratum 2, we first generate a new PSU
identifier variable **psu** and a new strata identifier variable **strata**.
In this way, we won't lose any information in the original dataset. Then, we
need to assign distinct values to **psu** for all the sampling units in strata
1 and 2 so that we can differentiate each sampling unit in the combined
new stratum. After that, we can change the value of **strata**. We also need
to **svyset** our data again using the new variables **psu** and **strata**.

use data1, clear egen psu = group(stratid psuid) if inlist(stratid,1,2) replace psu = psuid if stratid>2 generate strata=stratid replace strata=2 if strata==1 svyset psu [pweight=finalwgt], strata(strata)

Now, let us check again if there are any strata with singleton PSUs:

.svydescribeSurvey: Describing stage 1 sampling units pweight: finalwgt VCE: linearized Single unit: missing Strata 1: strata SU 1: psu FPC 1: <zero>

Number of obs per unit Stratum # units # obs Min Mean Max | ||

2 3 6 2 2.0 2 3 2 4 2 2.0 2 | ||

2 5 10 2 2.0 2 |

All the strata have multiple PSUs now. We can go ahead and run our svy estimation commands.

An alternative solution to handle the strata with singleton PSUs is
to specify the **singleunit()** option when we **svyset** the data. The default
specification is **singleunit(missing)**, which results in missing values
for the standard errors. Other than that, there are three options. The
first one, **singleunit(certainty)**, will treat strata with singleton PSUs
as certainty units, so those strata contribute nothing to the standard
error. The second option, **singleunit(scaled)**, is a scaled version of
**singleunit(certainty)**. The scaling factor comes from using the average
of the variances from the strata with multiple sampling units for each
stratum with a singleton PSU. The third option, **singleunit(centered)**,
specifies that strata with singleton PSUs be centered at the
grand mean instead of the stratum mean.

Here is an example using **singleunit(certainty)**:

.use http://www.stata-press.com/data/r15/nhanes2b, clear.svyset psuid [pweight=finalwgt], singleunit(certainty) strata(stratid)Sampling weights: finalwgt VCE: linearized Single unit: certainty Strata 1: stratid Sampling unit 1: psuid FPC 1: <zero>

Linearized | ||

Mean std. err. [95% conf. interval] | ||

hdresult | 49.67141 .3829811 48.88813 50.4547 | |

For more details about the methodology used by Stata when estimating the variance
in survey designed data, please see the entry of
[SVY] variance estimation.
You can decide how to specify **singleunit()** based on your analysis assumption.