Archive for March, 2010
Early in my career, I took over as system administrator of a land inventory system for a woman who was going on maternity leave. There was an imaging component to this new system which was supposed to make paper files obsolete. However, there was much grumbling about this paperless feature, and many business users resisted using it by continuing to use the physical paper files. To prevent this, the week before she left the system administrator had all the paper files sent off site. So I walked into an unfolding disaster on my first week! The business users were furious and demanded their paper files back. What could I do?
I asked the business users why they didn’t like the imaging system. What was wrong with it? It turns out they had a lot to say. It didn’t allow searches. It didn’t allow book-marking. It was not user-friendly. The list went on and on. The reason they wanted the paper files was it was much easier to do their jobs with them. I took all of their complaints on record, and then ordered their paper files back. I did this because any information technology worth its salt should make business users’ jobs easier, not more difficult. Forcing users to use a suboptimal product when a better alternative is available (even if it is paper) is ill-advised at best and destructive at worst. It shows an attitude of arrogance and indifference to your users. Summing up this attitude, a fellow IT consultant once told me “Some people think that user is spelled with an L”.
This is not to say you shouldn’t ever introduce new technology to a resistant group of employees. But unless you can articulate the benefits, and they can understand what those benefits are, then no one is any further ahead. When evaluating a software product, be sure to seek the advice of those who will actually use it. Business users have a good idea what it is they need to do their jobs. Listen to them. Invite the input and creativity of your business user community and you will have an information technology system that is part of your business’ success, not its failure.
I attended last night’s meeting of the Ottawa SQL.net Professional Association where we received a presentation of Microsoft SQL Server 2008 R2 by Damir Bersinic. R2 is SQL Server’s forthcoming release which should be available in May 2010.
When I got home, my wife asked me “Why didn’t they just call it SQL Server 2010?” Typically a version indicator like this marks a minor release, but it does sound a little strange. It reminds me of a story told by a fellow BI consultant a few years ago in which he was grilled by a U.S. customs agent for traveling to the United States for SQL Server 2005 training (“But it’s 2007!” the agent protested).
Despite the odd moniker, this release does have some interesting features particularly for security and Business Intelligence. Here are some of the highlights I noted:
Security: Building on the ability to encrypt columns in SQL Server 2008, it is possible with R2 to encrypt the database mdb and log files with what is referred to as “transparent data encryption”. Specifically, this is to prevent anyone walking off your business site with a USB key loaded with a copy of your entire database.
BI: With an emphasis on self service features, the new BI enhancements allow business users to bring large scale data sets directly into Excel with a tool called PowerPivot for Excel. Using in-memory storage, this allows you to manipulate millions of rows in Excel very quickly and easily.
SQL Server 2008 R2 Parallel Data Warehouse Edition: A SQL Server edition specifically designed for extremely large scale data warehouses, to hold tens or hundreds of terabits of data.
Efficiency: SQL Server 2008 R2 has further improved page compression and data compression, allowing more data per page. This reduces the number of reads and writes required, making database performance faster.
Master Data Services: A SharePoint based application that manages business definitions and rules with respect to your data. This is intended to simplify the management of these definitions, especially as they change over time.
Dashboards: There are now customizable dashboards for both DBAs and report administrators to monitor performance, resource utilization and other environment statistics.
The word is out that SharePoint will be the standard platform of the future Microsoft Business Intelligence. A SharePoint breakfast session will take place on April 22 in Ottawa, as well a number of other Canadian cities between April 19th and May 3rd.
It appears that in-memory storage/processing is becoming the new standard for Business Intelligence, as both IBM Cognos TM1 and Microsoft SQL Server PowerPivot for Excel are using it to great effect.
On too many projects, I have noticed that the Business Intelligence component has been treated as an after-thought in the overall information technology infrastructure. There seems to be a prevailing view in some organizations that source production systems are the meat and potatoes and BI reporting is maybe the dessert or the seasoning or, at best, the gravy. I have seen this attitude in the application of ERP system changes that were unplanned, untested, or unannounced to the BI group, resulting in broken ETL processes and failed data warehouse updates.
From an information technology perspective, BI needs to be considered one of the essential components of a healthy organization. Dashboards, metrics and reports are not a mere indulgence but rather integral to the organization’s success (indeed, would anyone board an aircraft that didn’t have a functioning dashboard?). I see BI not as the dessert but as the vegetables – necessary, not always pleasant, but something to make your organization healthy and strong. Make BI a part of your organization’s well balanced diet.
A number of years ago, I had a client with an in-house developed application for sales and inventory. They were a small business and they had hired a programmer to develop this solution for them. However, after it was completed this programmer decided he had had enough supporting it, and left them. They approached my employer desperately searching for a Delphi programmer and so soon I was on assignment. I’ll never forget one of the first change requests they had for me. It was so simple. They needed a search feature in one of their list boxes. It took me maybe 20 minutes to complete the change. And once they saw what I could do, they were ecstatic. I was suddenly seen as a miracle worker! I understand that the change requests made to the previous developer were met with derision and resistance, a not uncommon occurrence in the world of IT.
My lessons from this experience were a) listen to your business users – sometimes what they want or need isn’t that big a deal, and b) don’t say no automatically to everything. Sometimes there are very good technical reasons why something can’t be done. If so, explain. But always saying no reminds me of a Dilbert joke translating techno-speak: “It is technically impossible” Translation: “I don’t feel like doing it”.
Early in my data management career, dealing with dates was a royal headache. Every database system and flat file seemed to have its very own date format, which was typically completely incompatible with all other date formats. This often meant reading dates in as strings and converting them, an often ugly and painful process.
Fortunately, dates seem to be easier nowadays. Maybe it’s just because my data sources are generally limited to Oracle, Microsoft SQL Server or SAP systems. But even here dates can be problematic.
The first thing to note when reading Oracle dates into a SQL Server database is that the date ranges do not match. Oracle dates range from Jan 1, 4712 BC to Dec 31, 4712 AD. SQL Server dates range from Jan 1, 1753 to Dec 31, 9999. You might reply “Who cares? I’m time-stamping general ledger entries, not archaeological digs.” But this is where data quality comes into play. Say your business user enters the year 99. Oracle may store this date as 1099. And when you read this date into SQL Server, the record will be automatically rejected because it is outside the valid SQL Server date range (Yes, this has happened to me). In this scenario it may be readily apparent that the user meant to enter 1999, and data cleansing can be performed. But that may not always be the case. Is this a birth date? It could conceivably be 1899 or 1999. Make sure your data cleansing assumptions are valid before you proceed.
Another interesting date that I often deal with is the SAP Julian Date. Whole numbers count days and fractions count part of a day. Don’t confuse the SAP Julian Date with the classic Julian Date (JD) which has been used by astronomers for centuries. The classic Julian Date starts at Jan 1, 4713 BC at noon Greenwich Time (they chose noon because historically astronomers didn’t make observations during the day). SAP’s Julian Date is actually Modified Julian Date (MJD) which was invented in the 1950’s at the dawn of the computer age. Its fractional component starts and ends at midnight (not noon) and its whole number component is reduced significantly (MJD = JD – 2400000.5). Typically there is no auto-convert feature available with SAP Julian Dates. If you are dealing with SAP Julian Dates in Cognos, I highly recommend making a standard function to convert your dates for you. Handling SAP Julian dates is easy as long as you get this function right. There is help online for Julian date conversion, plus many Julian calendars and converter calculators – these are very helpful for debugging purposes.
Incidentally, Microsoft Excel also stores dates as a whole number plus a fractional component. These are neither JD nor MJD but rather serial dates, starting at 1 on Jan 1, 1900 (dates entered prior to this will not be recognized as dates). Standard Excel dates can be read into SQL Server automatically without any problem.
I have not yet found an ideal solution in dealing with data quality, and find that the circumstances around the particular issue will decide how to approach it. My approaches fall into one of the following categories:
a) Reject bad data
The data modeler in me says to reject bad data. It doesn’t fit, so don’t accept it. A control table has been established for a reason – if the code or entry doesn’t match anything in it, it should be rejected. Rejected records can be reported in a myriad of ways, but the hazard of this approach is that these rejects can be easily ignored – that is, until someone decides the numbers “don’t match”. Then there is much consternation that the ETL process is “dropping” records “arbitrarily”, which is not at all true of course. But this approach does force correction in your source system, and your data warehouse contains only business-defined legitimate data.
b) Load bad data anyway
The business analyst in me says to load the bad data anyway – the data is there, so they should see it. This is problematic for a Kimball-style dimensional table, for essentially some fudging must go on. Inferred dimensional records can be created or a broad and unholy category of “unknown” must be added. Both approaches can mask the problem, the latter much more so. Bad data can reside in the data warehouse for months or even years before it is addressed, hidden away under a miscellaneous category or placeholder surrogate keys.
c) Cleanse the data
This is only possible if specific business rules or fuzzy logic can be applied to the data in question, which I find is not usually the case. If it was easy to fix, they would do it at source themselves.
d) Flag the data
Flagging the data is another approach, but also has its pitfalls – the largest being that the business users will simply ignore the flag. Surrogate key placeholders will still be required in the dimensional table. This is really just a minor variation of choice b. But it can also be useful for identifying data that is not rejected but still outside the business process norms (such as a sale price below cost or a hired employee above or below a certain age).
Identifying the source of the problem is another issue – was the data entered incorrectly or was a business process not followed that led to data outside the limits of acceptance? The difference between the two is huge and can only be determined by the business side, often only through considerable time and investment on their part. The effort they spend on this will depend on how large a problem they perceive the data in question to be.
What is of paramount importance to me is to recognize and respect your business users’ ownership of the data. This is doubly important as an outside consultant. Your users know their data and can often tell a problem instinctually. Don’t tell them a report was built to spec if they complain that data is wrong. Even if it was built to spec, the specs could be wrong – and often are. Find the source of the problem. It can be anything from an incorrect data source or calculation to a faulty assumption or misunderstanding.
Sometimes feats of creativity are required to deal with specific problems. Once I had to deal with an inventory system that went offline on weekends for certain locations, and the only source of reliable inventory data turned out to be back-dated GL entries. We had to insert adjustments into our weekend inventory snapshots after the fact based on these accounting journal entries because the official inventory count was not correct. Only your business users will know the best way to handle situations like this. The heart of the issue is that the data warehouse needs to reflect the reality of the business, and if it doesn’t you have a data quality problem.
A colleague recently posted a simple question online “What is Cognos Express?” I have heard the marketing hype and I have been to many recent IBM Cognos events and demos, and yet I could not answer this question to my own satisfaction. So I did some research and found an explanation.
To fully understand Cognos Express, first we need to understand TM1. In a nutshell, TM1 is a calculation engine. It is a server-resident 64 bit in-memory OLAP tool that is very powerful, especially for on-the-fly scenario planning. It is the next generation of OLAP analysis from IBM Cognos, intending to phase out Powerplay cubes and perhaps even Cognos Planning.
Cognos Express is a simplified (and hence cheaper) version of Cognos that allows access to the behind-the-scenes TM1 engine (which is included with Express). It is being marketed as a Business Intelligence and Planning solution for mid-sized companies. Cognos Express includes three modules: Reporter (a web-based ad hoc query tool), Xcelerator (a Microsoft Excel interface) and Advisor (a tool that allows OLAP analysis and reporting). It has been described as 1 part Cognos and 2 parts TM1.
Hopefully this helps cut through the marketing glitz and gives us a better idea what it is and how it works.
“Gobbledygook may indicate a failure to think clearly, a contempt for one’s clients, or more probably a mixture of both.”– Michael Shanks, former chair of the National Consumer Council of the U.K.
I recently read an article in Wired magazine that looked back on the 10 year anniversary of the dot com craze. One of the things it notes from this period was some of the “business goobledygook” forged at the time – a list that includes “Data warehousing“, “ERP“, “ETL“, “OLAP“, “OLTP“. I was somewhat taken aback. I deal with these terms on a daily basis and they are not Greek to me – these are words and acronyms with very specific technical meaning. Dropping them from my vocabulary would mean not being able to communicate professionally.
I do exercise care when using these terms and ones like them (I always tell people I am a computer consultant, not that I am in “business intelligence” lest I be mistaken for a corporate spy). But it was a revelation to me that our lingo is incomprehensible even to other techies. Imagine! Yes, it is technical jargon. But it’s only gobbledygook in my opinion when used to deliberately bewilder or belittle others. So use with care!