Archive for July, 2010
An introduction for my many international followers:
King: Canadian Prime Minister Stephen Harper
Page: Canadian Industry Minister Tony Clement
Statistician: Recently resigned head of Statistics Canada, Munir Sheikh
Special thanks to data quality expert Jim Harris whose Dr. Seuss-style data quality limmericks and songs served as a partial inspiration to this piece. His blog can be found on my Blogroll (Obsessive Compulsive Data Quality).
Good King Censusless looked out
On the cottage season.
With the sunshine round about
Warm and crisp and even.
Everyone was drinking beer
Feeling great elation.
How could he disrupt the cheer
breaking cross the nation?
“Mr Clement, come by strife,
If you know so, say it.
How can I make foul the life
Of the summer respite?”
“Sire, a man I once knew long
Loathed the census taking
If you could remove this wrong
You’d be nation-making.”
“Make it so”, he said at once
With no consultation,
“Though I may be thought a dunce
Statistician would not toast
His part in this madness.
He would rather quit his post
Causing him much sadness.
Harper bellowed “What a fool!
Get that man to focus!
He should know that math’s not cool,
Stats are hocus pocus.”
Statistician stood his ground
In the public’s favour.
He said he was honour-bound;
People saw him braver.
“Bring me hatchets, bring me fire,
We shall burn his cabin!
He’s earned my unholy ire!
He won’t know what happened!”
Page and Monarch, off they trode,
Off they trode together
Feeling stormy, yet instead
Of the sunny weather.
Statistician’s cabin burned
To the ground next morning.
Page and Monarch never learned,
Though this be a warning:
Cabin dwellers all be sure
Be you all accounted,
Those who cannot count the poor
Can’t themselves be counted.
Why do we collect data? What is it good for? Do we even need it? These are the questions that I see posed in the Canadian census debate. As a data practitioner, I have seen my share of useless data, poor data, fudged data, and absolutely essential data. Today’s corporations wade through masses of data to find nuggets of data gold. Running a corporation today without data is like flying a modern aircraft without a functioning navigational system. The same could be said of running a government.
Censuses have been conducted in all sophisticated societies in history, usually with the most up-to-date technology of the day. The U.S. Census of 1890 employed the newly invented Hollerith tabulating machine. Within decades tabulating machines were essential to major enterprises. Following a merger in 1911, Hollerith’s company was renamed International Business Machines in 1924.
Census data collection has evolved since then, with some trail-blazing nations forgoing the census altogether. But make no mistake: in place of mandatory long forms, there is a centralized registry of citizens complete with national ID numbers. I think this is a good and efficient system but would libertarians ever agree to this? Surely not if a 20% chance of filling out a form once every 5 years is too “invasive”.
Why doesn’t a voluntary form work? Simply put, “responder bias”: your sample population is self-selecting or otherwise skewed. In one of the most famous cases of responder bias in history, George Gallup correctly called the 1936 presidential re-election of Franklin D. Roosevelt when everyone else got it wrong. Most other pollsters of the day sent mail-out ballots to potential voters based on phone numbers and car registries. But in those days, millions of voters had neither telephones nor cars! How did Gallup do it? He sent pollsters to talk to people in person. And hence the Gallup poll became a mainstay of politics.
Any census must deal with the question of data quality. Much has been made of the “Jedi Knight” entries under “religion”. How companies deal with data quality is by employing standards or business rules against a data set. Certainly collecting data as close to its source as possible is a very good way to ensure quality data, as is automating data collection. But what is proposed in Canada will weaken data quality, not strengthen it. No superior alternative is being proposed.
Is the long form census perfect? Not at all. Is it 100% correct? No. Is it labour-intensive and quickly outdated? Yes. Could we collect data in a better way? Yes. But it is better by far than a voluntary form because a voluntary form will degrade data quality.
And why does data quality matter at the end of the day? Because bad management starts with bad data. Sometimes bad data is systemic, such as that which led to the global financial crash of 2009. Sometimes bad data is deliberate, such as that which led to the rise and eventual demise of Enron. But hiding or fudging data is dangerous and damaging – it will be discovered eventually and your reputation will show it. Whether you are trying to hide toxic assets, off-balance sheet debt, shoddy manufacturing, unsafe products, poor employee performance, or entire segments of your population, you will be found out by independent researchers, international governance organizations, concerned consumers, outraged citizens or inside whistleblowers. And the day of telling will not be pretty.
Throughout the ongoing controversy in Canada over the end of the mandatory long form census, many have argued that Denmark (among other Scandinavian countries) no longer conducts a census. I asked fellow data professional and blogger Henrik Liliendahl Sørensen to explain how his country manages population data as a guest contributor to BIProfessional.com:
Census Options: The Scandinavian Model
The Scandinavian model exemplified through the Danish variant does not require citizens to periodically fill out a census form. Census information is extracted automatically when needed from administrative registers.
When a new Danish citizen is born (typically at a hospital) the child is assigned a national identification number within minutes. The ID is linked to the mother’s ID and, if she is married, also automatically to her husband as father as well. Otherwise the father’s ID (if possible) is obtained within a short time. In case of immigration, procedures exist for assigning national ID and collecting basic data. All information is kept in a centralized citizen registry.
The less romantic consequence of a marriage is that the two national IDs are linked in the citizen registry from that day forward. A divorce will result in a deactivation of the link.
All buildings, and if not a single family house, all the apartments within, are reflected in a centralized registry. When establishing a new house or apartment a lot of data is captured and if the residence is changed the data will be updated.
Your place of living is a relation between your national ID and the unique ID of the residence having the valid-from-date being the day you moved in until the day you move on is registered as the valid-to-date.
Practically all events in the life of a citizen involving a public sector body are logged with the national ID. This also includes healthcare and interaction with financial services and employer relations where mandatory reporting exists.
The technical opportunities for compiling census information based on these registrations are plenty. However every case must be approved by a body within the authorities and wherever possible data must be made anonymous in the actual processing.
In the previous entry in this series we took a brief look at the BI Bus API, a collection of classes (either in .NET or Java) or a legacy VB6 .dll that can be used to perform actions against Cognos 8 – tasks as varied as changing attributes of C8 content, running reports, changing security settings, etc. Essentially any task that can be performed through the Cognos 8 interface can be executed through calls to the correct part of the BI Bus API.
Another way of performing tasks is to interact with the Cognos dispatchers through calls to the Cognos gateway – the URL API. You can format an execute a call using a specially formatted URL passed over HTTP/HTTPS. Tasks that can be performed this way include starting Cognos components, and executing tasks such as running a report.
As an example, starting Report Studio can be accomplished by calling the following URL:
(In this example you would of course need to pass the correct dispactcher/gateway urls, which are not likely to be “localhost” except on a demo machine.)
Including the parameter &ui.object can be used to open a specific report:
…Where PATH above is the path to the report. This can be found most easily by examining the properties of the report within Cognos Connection.
Using the URL API is useful to embed Cognos studios or content within a browser window or frame in a non-Cognos application. The syntax of the URL API is well documented within the API documentation. As with the BI Bus API the range of actions that can be performed is quite extensive, mirroring what can be called through the regular UI. The URL API can be thought of as a light-weight way to accomplish certain tasks or easily embed content within a web application other than the regular portal. Cognos suggests that for more complex tasks the BI Bus API be used.
There has been considerable controversy brewing here in Canada since the government announced this month that the 2011 mandatory long form census will be dropped, to be replaced by a voluntary one. Opposition has been fierce and on many fronts, from statisticians, politicians, business leaders and social advocacy groups. Yesterday the chief statistician with the government quit his job in protest. All agree that if returned on a voluntary basis, the results will be skewed and the data set will not be comparable with previous census data.
Incidentally, the U.S. government experimented with this idea in 2003 and quickly dropped it. They found that the overall response rate dropped by one third, and the response among some demographic groups dropped to a mere 20%. The Canadian government proposes to get around these shortcomings by sending out even more forms and presumably cleansing the degraded data, an approach the U.S. government rejected on cost considerations.
Putting aside political arguments, what if we were to look at this strictly from a data quality point of view? Clearly a fundamental loosening of the rules around data collection will have profound consequences on the data collected. Imagine if you instructed your sales staff that they could enter their sales data voluntarily. Some may continue to enter data as before. Some may stop entering data at all. And still others may enter data if they get around to it, perhaps fudging or guessing. Maybe they will deem the exercise “optional”, or perhaps “unimportant” or even “useless”. Worse yet, maybe they will attempt to “game” the system in their favour (yes, I have seen this happen). With missing, incomplete or false data, your data quality is wholly compromised. Now who is your best sales person? Your worst sales person? Can’t tell anymore? Data cleansing might be able help, but it is labour-intensive, error-prone and very expensive.
Those of us who work with data professionally know that data integrity is determined by business rules. When you change the rules you change the results. And when you change the results you change the quality. It is very common to see organizations attempt to improve data quality through stricter or more explicit business rules, but it is quite bizarre to see our government choose to do the opposite.
The portion of the Cognos SDK probably of greatest interest to developers is the BI Bus API. This API (available in VB6, .NET and Java flavours) enables the developer to write code to perform virtually every task that can be performed through the normal UI. This means that the developer can use the API to do everything from automate administrative tasks to embed calls to Cognos 8 functionality in another application.
In .NET there are 2 main dlls that must be referenced by the developers code: cognosdotnetassembly_2_0.dll and cognosdotnet_2_0.dll (there are previous versions, cognosdotnet.dll and cognosdotnetassembly.dll provided as well, but these are meant for use with the 1.1 version of the .NET framework – by now you are probably on version 3 or even 4 of the framework). In Java a number of the JAR files must be referenced (they are in the sdk\lib folder.)
(The single VB6 dll is cdk.dll, although by now you are probably making use of .NET if you are in a Microsoft environment.)
Once these are added to your development environment you will have access to a variety of methods that enable your application to log into and manipulate the Cognos environment. Your application can traverse the Cognos Content Store to retrieve lists of reports (and other objects), get metadata about them, execute reports, delete or move them – virtually any Cognos operation, including operations on Users and Groups.
As an example, we will look at a class that is at the heart of the API – the Cognos “baseClass”, an abstract class that represents an object in the Content Store. This object can be a directory, a report, a report view, and many other object types. Once a reference to the object is obtained, the object can be executed, deleted, changed in some way – whatever the methods or properties available to the specific type of object.
For example, your code could retrieve a reference to all reports with a certain pattern in the name, and then you could delete all these reports, or alternatively move them to a particular folder.
So how do you get a reference to the object? First, you need to log in to the C8 Content Manager Service, for which there is a handy class available:
contentManagerService1 c8serv = new contentManagerService1();
c8serv.Url = c8url;
First, we create a new contentManagerService1, and provide the URL of the service. We build an XML-encoded string of user credentials (not shown). Finally, we call the logon method of the contentManagerService instance, passing it the credentials and an optional value for an array of paths to user roles.
Now that we’ve logged on, we can query the Content Store. Think of it as querying a database, but in this case you use a Cognos-specific query method called “query” that is a method of the contentManagerService1 created above. This will return an array of baseClass objects that can then be manipulated:
bc = c8serv.query(sPath, props, sortOptionArray, qOpts );
(We’ve left out a lot of the specifics in this example, but the parameters that are passed to the query method determine what is returned, based on the “path” given (sPath above) and the “properties” that are requested (props above))
Now we have an array of base class objects. The base class is an abstract class that represents a generic type of object in the C8 content store. The “path” above determines specifically what objects the content manager will return – for example, all object in the content store, or only report objects, or all objects in a certain hierarchy, or whatever the developer wants – it is all determined by the “path” supplied to the query. The formatting of this parameter is complex, and it occupies a large section of the SDK documentation.
Once we have an array of base class objects, we can iterate through it, inspecting and/or setting properties, or calling whatever methods are available – moving objects, changing some aspect of them, etc.
These operations aren’t simple, and operations such as executing a report may require interacting with a number of different services (the content manager service is just one available service of several, including the report service, monitor service etc.) The good news is the SDK provides a consistent, highly-typed way of performing these kinds of tasks.
In the next entry in this series we will take a look at some of the URL commands that are available that enable the user to perform tasks on the server simply by passing it the appropriate parameterized URL.
The following is the latest installment of “Cognos Tips” with Peter Beck:
Most Cognos BI developers (i.e. folks who spend their days using the tools provided by Cognos to deliver BI) are aware of the Cognos 8 Software Development Kit. In practice not many make direct use of it, and probably for good reason. First, skills with the Cognos toolset don’t translate directly into software development the skills required to utilize the SDK – skills with Report Studio or Framework Manager are not the same as java or .NET development skills. The SDK is also expensive, and is typically licensed only to large shops that have special needs and can justify the cost. The existing Cognos BI tools are pretty robust, so many shops think that extending them or creating new functionality using the SDK is not worth the effort.
But what is the SDK? Essentially it is a set of APIs that allow the developer to call the Cognos 8 service without having to go through the normal “front end” portal. For some functions these calls can be made through xml-encoded calls to URLs (the “URL API”). For others, a set of Microsoft .dll and Java .jar files are available the enable calls to the “BI Bus API” from either MS .Net languages (C#, VB .NET) or from Java.
(A .dll is also available to enable calls from “old” Visual Basic (i.e. VB 6) but if you are new to the SDK you should probably start with either .NET or Java – the VB 6 .dll will eventually be dropped, and both .NET or Java provide a much better programming “paradigm”.)
What can you do with the SDK? Anything you can do through the portal, with the advantage that you can do it programmatically. Cognos explicitly states in their documentation:
“Virtually everything you can do with the Cognos 8 graphical user interface, you can do using the appropriate API, XML file, or command line utility.”
This tells us that what most users think of as “Cognos 8”, i.e. the experience of using the tools through the web interface, is in fact a “front end” for the API.
(A separate SDK is available for Framework Manager, which is outside the scope of this series)
But by making the experience of the Cognos 8 interface programmable through the SDK, some interesting and useful extensions to Cognos 8 become possible.
For example, within the portal it is possible to select a user from your authentication source, some aspect of the users profile, and apply it to another user. Doing this for many users could be onerous, but automating the process by writing a small utility that selects a source profile and applies it a number of specified destination users could be quite useful in some environments.
You could create a utility to check for some condition in the data warehouse, and enable or disable reports based on that condition (for example, if the data warehouse builds are delayed, you don’t want users running reports that may not be complete).
You could embed Cognos reports within your own application – for example, include a Cognos-based graph within an operational screen in a transactional system.
Some firms, such as Motio, have used the SDK to create powerful custom applications for Cognos 8 users, essentially becoming experts in extending Cognos 8 within the enterprise, and providing additional applications to help Cognos 8 shops manage their environments.
In the next installment we’ll do a brief review of the BI Bus API portion of the SDK.