Archive for May, 2010
The Curious Case of Google and the Utah Jaywalker
Posted by Scott Andrews in Data Quality on May 31, 2010
A woman struck by a car crossing a highway is suing Google for bad directions. This case raises some interesting points between data providers and data consumers.
Firstly, while the data itself was technically correct, it was obviously faulty but taken on faith to be an optimal solution. How many times a day does this happen in corporate life? The numbers are computer-generated and appear in a web portal, so they must be correct and the best we have! This is not always so, and common sense should not be ignored. Just because you have a state of the art dashboard doesn’t mean your organization can be set to permanent auto-pilot. Anomalies need to be investigated. Data that doesn’t make sense needs to be questioned. Savvy business people often have a sixth-sense for bad information, even if they can’t immediately prove it. Be skeptical. Question your process and your numbers. Look for ways to continuously improve. Business intelligence is about supplementing your own intelligence, not replacing it.
Secondly, where does responsibility lie? I remember a law class I took years ago referred constantly to the proverbial “reasonable person”. Would a “reasonable person” take directions to walk into highway traffic literally? While Google clearly generated a bad result in this case, is it their fault entirely that someone blindly followed those directions? It reminds me of the defense of the war-crime-accused “I was just following orders”. Sometimes following orders isn’t good enough. Responsibility is shared between those who create data and those who consume it. This is not to let Google or bad BI off the hook but to put data consumers on their guard.
Thirdly, who was at the greatest risk in this situation? Obviously the woman following the Google directions. While it is true that Google led her to the highway, she herself chose to step into the busy traffic – a life-and-death decision for her! Think about this the next time you take a major decision based on third party data: What do you have to lose? What do they? Who is going to have the greatest incentive to ensure data quality? He or she who carries the greatest risk.
To conclude, if you know in your gut that something is wrong, don’t proceed! Computers are fallible and business intelligence is not always correct. BI is only as good as the assumptions used to build its underlying data models. Computer models are only a best-effort attempt to replicate reality in numbers, not a virtual alternate all-knowing reality in themselves. Good business intelligence comes as a result of computers and smart business people working together. Don’t check your brain at the door – you’ll need it.
Going Rogue: Facebook and Data Privacy
Posted by Scott Andrews in Information Technology on May 26, 2010
A number of years ago, when our first child was born, we placed an announcement in the newspaper like good dutiful new parents. What we got in return was a bunch of calls from financial planners pitching education savings plans. Needless to say we did not announce the births of any of our other children in this way. What might look like smart, targeted marketing to some can come across as cheap, offensive, underhanded and even a bit sleazy. And I ask myself this week “Is this the future of Facebook marketing?”
When Facebook ads first came out, I was astonished at how untargeted they were. Work-from-home, date sites and cheap loans were the order of the day. Eventually you could vote for or against these ads, but even this appeared to have little impact on what actually appeared. I quickly tuned out and scarcely notice Facebook ads anymore at all.
What is far more effective, in my opinion, is the targeted search results that come back on Google. I am actually looking for something, and Google helps the marketers find me. It is a win-win. It is not intrusive or invasive, at least not the way I see it.
Facebook users signed up to share with online friends. In contrast, Facebook wants to open the data up to reap a fortune in mining personal data for marketing purposes. The crux of the matter is that Facebook and its users have diametrically opposed views of what the intended use and audience of all this data is meant to be. This is a circle not easily squared. How Facebook navigates these treacherous waters will indeed decide its future.
Using Microsoft SQL Server BCP Switches in Cognos Data Manager
Posted by Scott Andrews in Cognos on May 20, 2010
Have you ever had Cognos Data Manager running a SQL Server Bulk Loader job across a network at a very slow speed? This can be remedied with a BCP command line switch. Network packet size can be set to any number of bytes between 4096 to 65535. The switch is indicated by “-a”. If no switch is applied the default value of 4096 bytes is applied. Changing the network packet size from 4096 to 32767 reduced the runtime of a slow running Data Manager job from 3.5 hours to 20 minutes.
In fact you can run any BCP switch to optimize your Bulk Loader process by adding the switch and its corresponding parameter value to the “Additional Command Line Options” on the “Module Properties” tab of the Table Delivery Properties in your Cognos Data Manager job (see below). Parameter settings and run times will vary from system to system.
When Good Data Goes Bad
Posted by Scott Andrews in Data Quality on May 18, 2010
How many times have you heard the mother of a wayward son say “He’s not bad. He’s just misunderstood.” So it goes with data as well. For how can you tell bad data? Did that sale really happen? Does this employee really exist? Well, 99% of the time, yes. In my experience bad data isn’t bad, it’s just misunderstood.
Nowhere is this clearer than when you consider dates. A single transaction can and often is associated with a plethora of dates. When was it ordered? When was it promised? When was it shipped? When did the customer take delivery? When was it posted to the general ledger? If any business users are not talking about the same date at any point in their analysis, there will be trouble. “Why don’t our March sales numbers match??” My first question is “What is the date that drives those numbers?”
A measure’s definition can come into play. When measuring sales, do you include returns or not? Are sales gross or net? When counting employees, do you count contractors and temps? How are measures defined precisely?
Likewise, a dimension’s definition can also muddy the water. How are your sales broken down? How do you define your list of employees at headquarters or in a particular region? How are dimensions defined precisely?
When confronting a data discrepancy, my approach is to find the differences, find out what the “correct” result should be and then apply the rule change across the data set. Employee X was showing up on report Y because they were defined by postal code, not by reporting manager. What is the correct rule? Your decision makers will have to make that call. The decision must then be incorporated into the data warehouse design and also be communicated out to the business analysts for future consistency in reporting. Communication is a key component to ensuring data quality across an enterprise.
Avoiding Schisms in the Church of BI
Posted by Scott Andrews in Business Intelligence on May 10, 2010
For some time, self-service BI environments have been hailed by both marketing and technicians as a break-through that will enable business users to find data for themselves without the need of IT intervention. Many of course are skeptical of these claims. As a long serving BI professional, I too see the pitfalls of these self-service environments. But done correctly, I do see potential for good things.
I like to think of self-service BI environments as akin to common language bible translations. Before bibles were available in the common languages of English, French or German, one had to seek the advice of a high priest for a religious interpretation of scripture. Once the bible was available to any literate person, people could read the text themselves and make their own interpretations. This of course led to an explosion of religious sects as many interpreted the scripture in different and sometimes contradictory ways. Violent disagreement and often war would result from some of these differences. The ensuing mayhem eventually led to freedom of religion being enshrined in most Western democracies.
In IT/BI, we also have a sort of priesthood with our own specialized languages (SQL, MDX) and we have historically held a position of power and trust on the subject of corporate data. But when data does become available to the masses, we have to ensure that it is interpreted as uniformly as possible lest there be hundreds different interpretations of a corporation’s performance.
How can this be accomplished?
- Common metadata definitions – The data set metadata definitions must be communicated to the users and understand by all. What is a customer? What is a sale? What is revenue? Without controlling metadata definitions, the data is vulnerable to misinterpretation.
- Business user data literacy – Business users must have a good understanding of the underlying data. This might involve an in-depth understanding of statistics, engineering or accounting depending on the data sets in question. The better they themselves can understand it, the quicker they will see anomalies and resolve problems. The ability of this group to understand data models is often instrumental as well.
- Use of Tools – Users must be well trained in their BI tools and understand when one tool is better to use than another. Some tools work exceedingly well for some tasks but not for others. Some effort must be made to control and monitor usage of tools – Report Studio should not be used to derive large scale data extracts for example; this is neither its strength nor its intended use. Usage will tell you what is working well and what is not; where users are getting along well and where they are having problems. Question the value of tools that are not being used, or ask users what they find lacking in that particular tool.
- Data Scope – Business users need to understand the scope of available data in the BI environment. Any shortcomings need to be part of an ongoing BI plan, and users need to know when this data will become available. Some industrious users may resort to extreme lengths to secure data sets when data is nonexistent, such as typing it into Excel spreadsheets themselves. I have seen elaborate and labour-intensive Excel or Access solutions created by business users that needed to be reworked into the BI environment.
What this really comes down to is communication. As Thomas Redman puts it, when there is data misinterpretation, you do not have a technological problem; you have a communications problem. Like a good coach, BI must ensure that everyone understands the game plan and is on the same page.
Who’s Driving This Cognos 8 Report?
Posted by Scott Andrews in Cognos on May 7, 2010
The behaviour of Cognos reports can vary depending on the database driver you employ. This became starkly clear me the other day when a network administrator changed a database driver from OLE DB to ODBC. A report that was previously working started to report an “Invalid Descriptor Index” error. The specific error looked like this:
UDA-SQL-01107 A general exception has occurred during the operation “prepare”. [Microsoft][ODBC SQL Server Driver] Invalid Descriptor Index {Microsoft][ODBC SQL Driver] Syntax error or access violation [Microsoft][ODBC SQL Server Driver] Prepared statement is not a cursor-specification RSV-SRV-002 Trace back…
I’m never a big fan of ODBC drivers and avoid them whenever possible. But if you must use ODBC drivers, there is a way around this particular error. Although it isn’t obvious, the report is failing because it cannot match parameters. It is trying to match an integer to a char or a date to a char, or it has completed a complex calculation and is not able to discern what the resulting data type is anymore (I have even seen this happen to the results of simple date functions that were longer considered ‘dates’). A driver like OLD DB can usually figure out that integer 1500 equals char ‘1500’ but this is not always so with ODBC. You may need to cast your calculation results to a specific data type to ensure that those data types match in your calculations, query filters and master/detail relationships. In my experience, problematic data types are integers, chars and dates – especially if there have been complex calculations involved.
This can be particularly tricky to troubleshoot when you have multiple queries in a report. Each query may run successfully on its own, but when they are related together you may start to see the problems. This is when it is important to verify master/detail relationships throughout the report.




