Posts Tagged Census

Good King Censusless

An introduction for my many international followers:

King:  Canadian Prime Minister Stephen Harper
Page:  Canadian Industry Minister Tony Clement
Statistician:  Recently resigned head of Statistics Canada, Munir Sheikh

Special thanks to data quality expert Jim Harris whose Dr. Seuss-style data quality limmericks and songs served as a partial inspiration to this piece.  His blog can be found on my Blogroll (Obsessive Compulsive Data Quality).

Enjoy!

Good King Censusless looked out
On the cottage season.
With the sunshine round about
Warm and crisp and even.
Everyone was drinking beer
Feeling great elation.
How could he disrupt the cheer
breaking cross the nation?

“Mr Clement, come by strife,
If you know so, say it.
How can I make foul the life
Of the summer respite?”
“Sire, a man I once knew long
Loathed the census taking
If you could remove this wrong
You’d be nation-making.”

“Make it so”, he said at once
With no consultation,
“Though I may be thought a dunce
Causing consternation.”
Statistician would not toast
His part in this madness.
He would rather quit his post
Causing him much sadness.

Harper bellowed “What a fool!
Get that man to focus!
He should know that math’s not cool,
Stats are hocus pocus.”
Statistician stood his ground
In the public’s favour.
He said he was honour-bound;
People saw him braver.

“Bring me hatchets, bring me fire,
We shall burn his cabin!
He’s earned my unholy ire!
He won’t know what happened!”
Page and Monarch, off they trode,
Off they trode together
Feeling stormy, yet instead
Of the sunny weather.

Statistician’s cabin burned
To the ground next morning.
Page and Monarch never learned,
Though this be a warning:
Cabin dwellers all be sure
Be you all accounted,
Those who cannot count the poor
Can’t themselves be counted.

10 Comments

Why Data Quality Matters

Why do we collect data?  What is it good for?  Do we even need it?  These are the questions that I see posed in the Canadian census debate.  As a data practitioner, I have seen my share of useless data, poor data, fudged data, and absolutely essential data.  Today’s corporations wade through masses of data to find nuggets of data gold.  Running a corporation today without data is like flying a modern aircraft without a functioning navigational system.  The same could be said of running a government.

Censuses have been conducted in all sophisticated societies in history, usually with the most up-to-date technology of the day.  The U.S. Census of 1890 employed the newly invented Hollerith tabulating machine.  Within decades tabulating machines were essential to major enterprises.  Following a merger in 1911, Hollerith’s company was renamed International Business Machines in 1924.

Census data collection has evolved since then, with some trail-blazing nations forgoing the census altogether.  But make no mistake: in place of mandatory long forms, there is a centralized registry of citizens complete with national ID numbers.  I think this is a good and efficient system but would libertarians ever agree to this?  Surely not if a 20% chance of filling out a form once every 5 years is too “invasive”.

Why doesn’t a voluntary form work?  Simply put, “responder bias”: your sample population is self-selecting or otherwise skewed.  In one of the most famous cases of responder bias in history, George Gallup correctly called the 1936 presidential re-election of Franklin D. Roosevelt when everyone else got it wrong.  Most other pollsters of the day sent mail-out ballots to potential voters based on phone numbers and car registries.  But in those days, millions of voters had neither telephones nor cars!  How did Gallup do it?  He sent pollsters to talk to people in person.  And hence the Gallup poll became a mainstay of politics.

198/365 - Quality
Creative Commons License photo credit: aithom2

Any census must deal with the question of data quality.  Much has been made of the “Jedi Knight” entries under “religion”.  How companies deal with data quality is by employing standards or business rules against a data set.  Certainly collecting data as close to its source as possible is a very good way to ensure quality data, as is automating data collection.  But what is proposed in Canada will weaken data quality, not strengthen it.  No superior alternative is being proposed.

Is the long form census perfect?  Not at all.  Is it 100% correct?  No.  Is it labour-intensive and quickly outdated?  Yes.  Could we collect data in a better way?  Yes.  But it is better by far than a voluntary form because a voluntary form will degrade data quality.

And why does data quality matter at the end of the day?  Because bad management starts with bad data.  Sometimes bad data is systemic, such as that which led to the global financial crash of 2009.  Sometimes bad data is deliberate, such as that which led to the rise and eventual demise of Enron.  But hiding or fudging data is dangerous and damaging – it will be discovered eventually and your reputation will show it.  Whether you are trying to hide toxic assets, off-balance sheet debt, shoddy manufacturing, unsafe products, poor employee performance, or entire segments of your population, you will be found out by independent researchers, international governance organizations, concerned consumers, outraged citizens or inside whistleblowers.  And the day of telling will not be pretty.

5 Comments

A Senseless Change to the Census in Canada

There has been considerable controversy brewing here in Canada since the government announced this month that the 2011 mandatory long form census will be dropped, to be replaced by a voluntary one.  Opposition has been fierce and on many fronts, from statisticians, politicians, business leaders and social advocacy groups.  Yesterday the chief statistician with the government quit his job in protest.  All agree that if returned on a voluntary basis, the results will be skewed and the data set will not be comparable with previous census data.

Incidentally, the U.S. government experimented with this idea in 2003 and quickly dropped it.  They found that the overall response rate dropped by one third, and the response among some demographic groups dropped to a mere 20%.  The Canadian government proposes to get around these shortcomings by sending out even more forms and presumably cleansing the degraded data, an approach the U.S. government rejected on cost considerations.

Putting aside political arguments, what if we were to look at this strictly from a data quality point of view?  Clearly a fundamental loosening of the rules around data collection will have profound consequences on the data collected.  Imagine if you instructed your sales staff that they could enter their sales data voluntarily.  Some may continue to enter data as before.  Some may stop entering data at all.  And still others may enter data if they get around to it, perhaps fudging or guessing.  Maybe they will deem the exercise “optional”, or perhaps “unimportant” or even “useless”.  Worse yet, maybe they will attempt to “game” the system in their favour (yes, I have seen this happen).  With missing, incomplete or false data, your data quality is wholly compromised.  Now who is your best sales person?  Your worst sales person?  Can’t tell anymore?  Data cleansing might be able help, but it is labour-intensive, error-prone and very expensive.

Those of us who work with data professionally know that data integrity is determined by business rules.  When you change the rules you change the results.  And when you change the results you change the quality.  It is very common to see organizations attempt to improve data quality through stricter or more explicit business rules, but it is quite bizarre to see our government choose to do the opposite.

5 Comments