Yesterday Ottawa’s top cheerleader for open government data, Treasury Board president Tony Clement, declared some government data can’t be released digitally for fear it might be altered to “create havoc.”
You could say some were baffled.
— Colby Cosh (@colbycosh) December 22, 2014
This is an excerpt from the news report where Clement attempts to explain why some federal agencies have been responding to freedom of information requests by releasing government statistics as paper printouts rather than in digital format:
There are “virtuous” uses of government data — for example, comparing the information with provincial or municipal figures, Clement said. But “in certain situations” the government must “make sure that the data is not corrupted in some way.”
“[There’s a fear that people will] manipulate the data and publish it and say, ‘This is what the government of Canada is saying’ when in fact it’s not the case. That’s the problem.”
When the Canadian Press reporter pressed Clement for an example, he was unable to think of any.
I followed up this morning with a tweet again asking Clement for examples. A spokeswoman, Stephanie Rea, got back to me a few minutes later with a response and one example of information being misused. From a decade ago.
“One example is from 2003/04, Elections Canada created a single search engine for donations — that covered everything — party, ridings, candidates — very handy
That search engine was used by the Gomery commission to identify donations — and Elections Canada said there were errors in how they were using the data, so they had to take the search engine down. Only a month or so ago did they finally bring it back
Obviously not malicious, but it shows how information can be inadvertently manipulated to cause problems.”
Rea then pointed to this CBC story from last year, in which the broadcaster built an interactive map about pipeline spills from National Energy Board data. Because the original Excel spreadsheet file had been sent to the reporter as two static PDF documents, the CBC story explained, “each row describing an incident was now spread across three pages located in two documents.” The CBC then used optical character recognition technology to rebuild the database into a spreadsheet format it could map.
Rea said she wasn’t suggesting the information the CBC ultimately put on the map was incorrect, only that it was an example of how digital information can be manipulated.
But here’s the thing: the only reason the CBC had to go to such lengths to manipulate the data was because it had been provided in a PDF format in the first place. Had it come as an Excel file, none of that would have been necessary. Now to tumble a little further down the rabbit hole, because Access to Information requests are redacted to conceal personal or security information, the original Excel files can’t be sent out, so government agencies release the data in ridiculously unhelpful PDFs that force the end user to manipulate the data into a useable format, and thus introduce the potential for errors.
This example also fails on the logic front. If the oil spill data had instead been released on paper, as Clement apparently prefers, the same optical character recognition software could have been used to build a digital database from it, only it would have created even more potential for errors.
Wouldn’t everyone be better served if the federal government developed a technological solution that would enable it to export accurate digital data stripped of redacted information? Of course, that would require Ottawa to take requests for public data seriously.
The more fundamental flaw in Clement’s argument, though, is that in a world of truly open government data, anyone who takes information and alters it in a way that distorts the truth—be they a journalist, academic research or anti-government activist—would be outed in a nanosecond and thoroughly discredited.
There’s most definitely a problem here, but it isn’t Clement’s hypothetical bogeyman of nefarious types out to misrepresent the government.
Now here’s a comprehensive list of all the ways publicly available government statistics have genuinely been used to “create havoc”: