‘Exercise caution’

by Aaron Wherry

This year’s census data comes with an asterisk.

The agency released its final tranche of the 2011 census last week, focusing on languages, but it included a big warning that cautions data users about comparing key facts against censuses of the past. “Data users are advised to exercise caution when evaluating trends related to mother tongue and home language that compare 2011 census data to those of previous censuses,” Statistics Canada states bluntly in a box included in its census material.

Those are strong words for a statistical agency, since they raise profound questions about how the data can be used reliably to come to conclusions about language trends. Officials have undertaken a thorough investigation, with a report to be published shortly. “There are a lot of questions and responses that don’t seem to add up,” said Doug Norris, the chief demographer for Environics Analytics and formerly a census manager at the agency.


  1. And so it begins…..
    If you’re going to do a census, do it right and make it mandatory so there’s no selection bias. Or, don’t do it at all and at least save money.
    The Conservatives did neither. They spent an extra $30 million on a voluntary survey whose data will be very uncomparable, as has been alluded to here. This is a narrative just about everyone has forgotten.

    • I can’t thumbs-up you enough.

      I’m betting, too, that by the time the next census rolls around, the l.f. census, er, questionnaire will be cancelled. Score one for dumb government.

  2. So the old long form census was a bunch of “garbage”, and gave erroneous results.

    Now that the language questions are being included in the main census form, which everyone has to fill out, we are finally getting accurate results.

    Academics and government have been using bad data for decades. Who knew?

    • “So the old long form census was a bunch of “garbage”, and gave erroneous results.”


      The answer one gives to a survey question does vary depending on the context of the question. This clip from “Yes, Minister” eloquently sums up how one can devise a survey to generate both a “yes” and “no” to the question “Do you support national service?”


      Back to the long-form census. The mother language question used to be asked along with questions on ethnicity and birthplace. Now that the question is on the short form, it is being asked alongside basic demographic info. One may speculate this may cause some people to report the language they currently speak rather than their mother tongue.

      As the linked article fully explains, StatsCan usually has time to test-run such question changes and make adjustments to ensure the data remain comparable to prior years.

      HOWEVER, this time the Harper government announced the change so suddenly and with so little time that StatsCan did not have time to test-run the questionnaire to see how people would respond to the new question.

      (…. seriously, I am repeating almost verbatim the linked article, why wouldn’t you just read the freakin’ thing before posting…)

      Long story short: new NHS = “garbage”. Cost to taxpayers: $30 million.

      Sound fiscal managers my a**.

    • Basically you’re arguing that the entire field of probability mathematics is “garbage”.

      Congratulations, you’ve either just invalidated a couple centuries worth of learning, or proven yourself an idiot.

      • So 99% of the people answering a question (the new way) is supposed to be less accurate than 20% of the people answering a question (the old way).

        The simple hypothesis about what is wrong is that there was severe problems with how StatsCan chose their 20% sample on the old long form. i.e. that the old long form census data/methodology was flawed.

        There is no sampling error possible with the new method, since everyone answers the question, not only 1 in 5 households.

        StatsCan pretty much even admitted as much by saying they conditioned the answer on the long form with question order.

        To collect correct data on the old long form census, the questions should have been asked in a random order…i.e. the questioniare should have been distributed with more than one order of questions to eliminate/reduce error introduced by sampling methodology. They admit to shaping the order of questioning to get the right results.

        • So, avoiding responding to my post, eh?

          Random order is not a bad idea. And you know what, if StatsCan had been given time to test-run such an approach, they could have determined if that produced more accurate data, and ensure the question wording and question order didn’t prime certain responses.

          But they weren’t given time – the Harper government announced the methodology change so late in the game that it was announced just as the census forms were about to start printing, i.e. no time to test-run the changes to ensure it was answered accurately.

          The government botched this big time. They produced a garbage data set because they did not give StatsCan enough advanced notice to test run it and ensure otherwise.
          And this was at a cost to taxpayers of an extra $30 million.

          Again, sound fiscal managers my a**.

          • The problem with the long form cenus is there was no random ordering questions. StatsCan admits the ordering of questions on the old long form questions were conditioned.

            When the long form census was used, several versions of the same questions in different order should have been used to get accurate results, instead of tailoring the question order to get the results StatsCan researchers expected.

            They clearly admit to leading the respondent to give a particular answer in the old long form methodology.

            How much public policy research by government and academia was based on this obviously bad data.

          • Entirely agree, randomizing question order would improve the survey.

            Except who are you defending? The government didn’t do this! The 2011 census question order was not randomized either, so it is certainly NOT an improvement on 2006. Your statement “we are finally getting accurate results” is bunk – by your own reasoning you just gave 2011 is just as bad as 2006 and prior years.

            On an aside, thank you for explaining why proper randomization is crucial to getting a representative sample. I assume then you agree that the National Household Survey, with only voluntary responses, was a $30 million waste as it is very much subject to selection bias.

            Of course, if you disagree with me that and that was $30 million well spent by the government, I encourage you to respond.

            I have a feeling you won’t. Prove me wrong.

        • First, you need to pull your head out of your arse after you’re finished pulling the numbers from there.

          Second, not less accurate. Less reliable. Even if we accept your numbers, if the entirety of the 1% who didn’t answer were, say, Native Americans, then that skews the results. What’s worse, because we don’t know *who* didn’t answer in what proportions, we have to assume the results are skewed, and therefore pretty much worthless.

          As to your point about the order of the question, I suggest you learn what “conditioned” means. And then once you’re done that, you might be able to understand that even if it meant what you seem to think it means, you’re still an idiot, because the new census was no different in this respect. So if it caused any error before, it’s also caused it now.

  3. I’m sure this was ‘not’ their intention.

