I wrote a lot about the decision to cancel the mandatory long-form census and replace it with with the voluntary National Household Survey (NHS) in 2010-2011 (see here: , , , , , , , , , , , ), but that was all prologue. The real implications of the new survey are only now being played out with the release of the the numbers – I can’t bring myself to use the word “data” – collected by the NHS.
Here’s what Statistics Canada has to say about the quality of NHS numbers:
We have never previously conducted a survey on the scale of the voluntary National Household Survey, nor are we aware of any other country that has. The new methodology has been introduced relatively rapidly with limited testing. The effectiveness of our mitigation strategies to offset non-response bias and other quality limiting effects is largely unknown. For these reasons, it is difficult to anticipate the quality level of the final outcome.
The significance of any quality shortcomings depends, to some extent, on the intended use of the data. Given that, and our mitigation strategies, we are confident that the National Household Survey will produce usable and useful data that will meet the needs of many users. It will not, however, provide a level of quality that would have been achieved through a mandatory long-form census.
My reaction to the “usable and useful data that will meet the needs of many users” bit is “like who?” Because I’m far from convinced it will be useful in answering the sort of questions that census data have been used to answer in the past.
Most of the commentary I’ve seen makes the point that that if you’re interested in certain broad features about one variable, the NHS should provide usable information. This is quite likely to be the case: Statistics Canada has many other sources of information that can be used as a check against the NHS: tax files can be used for income, CMHC data for housing, and so on. Where these sources fall down – and where the NHS can’t be used with any confidence – is at the micro level of neighbourhoods and other narrowly-defined criteria.
But the real problem goes beyond that. We’ve always had non-census sources of information for several of the dimensions in previous censuses. (A notable exception is immigration data. The only reliable data we have about how immigrants are faring come from the census. The NHS may be able to piece together numbers based on the 2006 census and available information on inflows and outflows since then, but this is at best a patch job.) But what made the census special was that it captured the correlations between all of these variables. We may have a good idea about, say, the distribution of income, educational attainment levels and immigration status from non-census sources, but only the census could be used to put them all together to make meaningful inferences about their statistical relationship.
The NHS numbers may provide useful answers for questions where the availability of census data wasn’t crucial. But they won’t be much use in addressing the really interesting questions that only census data could answer.