Archive for category Business Intelligence

The Information Continuum

Recently an American citizen was turned away from a Canadian border crossing for having a 20 year old conviction related to a high school prank. An immigration lawyer commented that he was not surprised as “the databases are getting deeper and deeper”. The Globe and Mail complained that this case and other similar ones fail to accomplish the goal of keeping Canada safe from real criminals and terrorists. While this decision appears frivolous or even silly, I think it highlights the differences between data, information and business intelligence.

The Information Continuum

Data – A Fact or a piece of information, or a series thereof

Information – Knowledge discerned from data

Business Intelligence – Information Management pertaining to an organization’s policy or decision-making, particularly when tied to strategic or operational objectives

So when hazy, ill-formed or poorly constructed objectives are tied to simple data access, bad business intelligence can be the result.  Let’s consider two strategic objectives:

“Keep Canada safe from real criminals and terrorists”

(as suggested by the Globe and Mail editorial)

- This leads to questions like

  • What is a real criminal or terrorist?
  • How can they be identified?

- This might lead to

  • Information sharing with FBI and Interpol

- Data sets might include

  • FBI’s most wanted list
  • Interpol’s wanted list

In this case, the Information Continuum is  approaching the Business Intelligence level.

“Keep convicted criminals out of Canada”

(the apparent objective applied in this case)
  • This is a blanket statement that will catch everyone ever convicted of any crime (assuming the data is available), regardless of circumstances
  • It invites no further questions
  • You will have false positives (prior convictions with no future criminal intent) and false negatives (active criminals with no prior convictions)
  • In this case, the Information Continuum does not move beyond the data level

Simple data access will not make better business intelligence – if your people don’t have the right resources and direction, it could make it worse.  Your analysts should be able to distinguish between real information and noise, and they should always be striving towards your organization’s clear and thoughtful objectives.

6 Comments

SQL Server Analysis Services Cubes and Cognos PowerPlay

SQL Server Analysis Services is a popular OLAP product included with Microsoft SQL Server. Especially since SQL Server 2005 this product has been quite powerful and fairly easy to  develop with. SQL Server provides the Business Intelligence Development Studio (BIDS), a Visual Studio –like product to aid the development of Analysis Services cubes.

For browsing and reporting on a cube, however, choices have been more limited. Excel provides a good choice, especially since Excel 2007, which contains enhancements that make creating cross-tab reports easier than previous versions.

If your users are committed to Cognos PowerPlay, you can use this tool as well. Setting up a MS Analysis Services cube for browsing with PowerPlay is a little more involved than a regular Cognos cube, but is still quite easy to do.

First, you need to access a tool called PowerPlay Connect. This can be found in the Tools folder of your Cognos installation:

The executeable is ppconnct.exe.

This tool is used to create a binary “pointer file” with a .MDC extension. This file, once created, will behave like a PowerPlay OLAP Cube, but the underlying cube will actually be (in this case) a Microsoft Analysis Services cube.

Start PowerPlay Connect, select File… New to create a new MDC file. For the database type, select MS SSOS (ODBO):

You have a couple of choices next. If you know the server name for your instance SQL Server Analysis Services you can enter it in the next line, under Server:[Port]. In this case I can enter “localhost”, as I am serving the cube from my local machine.

Alternatively, I can select the … button beside Database, and I will be presented with the Chose a Remote Cube dialog box. In this case I then select Microsoft SQL Server OLAP Server at the bottom, and then select a connection I have already created previously using the tool. In this case the connection is called local. I’m then presented with a list of databases available on the connection “local”.

I can then open SSAS_Adventure_Works and the cube that exists in this particular database. A database might have many cubes available in it.

Alternatively I could create a new connection, by clicking on Connections… and then clicking Add. I enter the name I want to give the connection, and then the name of the server, and select Microsoft SQL Server OLAP Server and MSOLAP as the provider:

Since I selected the cube SSAS_Adventure_Works, we see this in the details of the connection string:

I can now click File… Save and save this as an .MDC file:

The file appears as a normal MDC cube, but is really just a pointer file to the SSAS database server:

Using PowerPlay, I can now open the MDC file as if it were an ordinary cube. I can navigate it generally in the same way I would navigate a Cognos cube, althought some things such as Measure Groups that are part of the Microsoft approach to OLAP do not behave exactly the same way. Meaures appear as a single list, much as they do in Cognos cubes.

PowerPlay Connect MDC files can be put on the network, or shared as any other file, and will work as long as the user has access to the underlying Microsoft database.

, , , ,

No Comments

Cognos Transformer and Category Uniqueness, Part 2

In Part 1 of this series we examined how Category Codes are generated. For many users the “uniqueness” of these codes is never a problem, but if they change errors can result in reports that use them.

For example, if the categories are deleted and re-generated, Transformer may not assign the same value to the Category Code. Here is the result after deleting the category California from our example in Part1, and regenerating it:

Instead of CA~8, we now have CA~9. This causes the category to disappear from our Reporter-mode report from Part 1:

You can imagine the impact on users depending on reports with large numbers of dimensions that suddenly disappear on them. As a developer, if you don’t understand the impact of category code generation you can spend a lot of time scratching your head about what is going on.

So what can we do? The surest method is to always ensure that the Category Code is unique in a dimension by assigning a completely predictable value to it that will never change, and that you can count on as being unique at least in the hierarchy, if not in the entire model. By taking control of this value we are ensuring that it is not assigned by Transformer. This is probably best done in the database, during the generation of the dimension at the database level (not the Transformer “dimension”.)

As a quick, alternative method you can ensure uniqueness by creating a calculated column in the dimension that you know will be unique for the level. In this case I have created a calculated column for All_States called State_Cat, and one for All_Countries called Country_Cat.

These are calculations based on concatenating Country_CD and Country (and State_CD and State). Since these calculations will be unique in our dimension we can use them as sources for the Catergory Code in each respective level (only All_States is shown, but the idea is the same for All_Countries):

Now when we generate categories, we see that the Category Code for California is CACalifornia:

Because we are controlling this value, and not Transformer, we can be sure that this value will not change, even if the category is removed and regenerated in the Transformer model. This will ensure that reports that use this Category Code will continue to work correctly.

, , ,

No Comments

Cognos Transformer and Category Uniqueness, Part 1

One tricky and poorly understood problem with the deployment of Cognos Transformer cubes relates to the concept of uniqueness within a dimension. Developers and users can proceed for a long time without having any problems with the uniqueness of categories, and then suddenly – usually as a result of deployment of an enhanced version of an existing cube – user reports may start to behave strangely. Selected categories may disappear from reports authored in PowerPlay, or drill-throughs in reports created in C8 may no longer function correctly. Managing these category codes (“Member Unique Name(s)”, or MUN(s)) requires a bit of forethought that may be skipped over in the development phase

These problems may be rooted in the fact that Transformer insists on a unique identfier across all categories in a dimension, regardless of the level of the dimension the categories exists at. This can cause some subtle and hard-to-diagnose problems.

Consider a trivial “sales” data set:

We have a denormalized “flat file” of data with a hierarchy of Country … State … City. If we create a set of dimensions in Transformer using this data set, we might create something like this:

At the level All_Countries, the level is defined as shown:

At the level All_States, the level is defined as shown:

Note that in both levels the Category Code is left undefined. This will be determined by Transformer, and Transformer will ensure that it will be unique for the dimension. It will base the code on the value in the Source for the level.

This uniqueness is key, because if we look back at our data, we note that we have the value CA for both a Country_CD (Canada) for level All_Countries and a State_CD (California) for level All_States. We have used Country_CD as the Source for each respective leve. By default, since we have not defined how to calculate the Category Code, Cognos will calculate it for us. It will do so based on the Source column for each level, using value CA in the case of the Canada category in All_Countries and the California category in All_States.

The potential problem arises because we must have unique values in the entire dimension for Category Code.  When Transformer finds that it has a value CA as the Source in All_Countries, and the same value as a Source for All_States, it must calculate a unique value for Category Code for All_States for California. It does so by putting a tilde (~) sign and a number after the Source value. We can see this in the Categories in Transformer, after the categories for the cube have been generated. Note the value CA~8 for Category “California”:

To reiterate: the Category Code value must be unique in the dimension.

The problem arises if we use this Category in a report in PowerPlay, for example. Consider a PowerPlay report authored in Reporter mode, that includes California:

The key point here is that the report is using the Category Code CA~8 “under the covers”, not the value California, to identify the value to be returned.

For many users this is not a problem. As long as the calculated Category Code “CA~8” never changes the user’s report will be fine.

But what happens, if as part of a development process, for example, the Category Code gets changed? This can cause some real headaches. If the Category Code is changed for any reason, the users PowerPlay Reporter-mode report will break. One result could be the disappearance of the category that is in the report:

In Part 2 well take a closer look at what can happen when Category Codes change, and propose some solutions.

, , ,

1 Comment

IBM Cognos C10 Dashboards

IBM Cognos 10 arrived with much fanfare last fall, and the IBM Cognos community is starting to absorb what has changed from C8. For the most part this version seems to be a fairly modest, “evolutionary” change, and I suspect that for the most part environments that have C8.4 running well will not be in a huge hurry to upgrade. From an architectural perspective this is in the same vein as the Cognos BI that we’ve seen since ReportNet, although the product has matured considerably since then. This is not the giant leap made from Series 7 to the ReportNet/C8 “enterprise” product.

That said C10 does have some interesting new features, once of which is Business Insight, or as they are more commonly referred to, “dashboards”.

Cognos 8 has a nice feature of being able to create custom pages with embedded report – and other content – that could be set up as tabs available to users on their default screen. These are useful for setting up dashboard-like content. For example, users in Finance can have a few of their most common reports available as tabs, and by selecting the tab can run the without having to explicitly navigate to the appropriate report and run it.

Extending the idea to the executive level, a few key reports can be assembled into a dashboard providing certain key metrics about the business. But of course tabs are useful in lower-level situations – for example, reports that monitor the health of a data warehouse can be “tabbed” so support staff can have instant access to metrics around, for example, warehouse rebuild times.

Tabs have always been a little awkward to set up, and to address some of their shortcomings IBM Cognos 10 introduces Business Insight, an easier way to create and share dashboard-style collections of reports (and other content).

Creating a new dashboard is simple – simply launch Business Insight from the Launch menu:

The dashboard creation screen is intuitive, with a Content tab to easily locate the content you want to place on the dashboard:

Content is placed on the dashboard with a drag-and-drop approach that allows for easy positioning of many reports on a single dashboard:

Each item on the dashboard is a “widget” that can be managed individually. This allows for some interesting features. For example, a list report can be quickly transformed into a chart-type report by selecting the widget, and the selecting “Change Display Type”

In this case we’ve selected a Column Chart, and the widget is quickly transformed to a new display type:

The ability to format the chart is quite limited, but this could be useful in situations where the customer wanted to be able to see things several different ways, if the underlying list report is set up correctly.

Another interesting feature is the ability to comment on any widget. Selecting Add Comment brings up a comment box. Users can leave a trail of comments on each widget.

Widget content can actually be selected from the components of each report – you don’t have to bring in the whole report:

In the case above, we can see that Percentage Sales By Year has both a Pie Chart and a List – either one can be used as the source for a widget.

Finally, if Business Insight Advanced is activated, the users can alter or create content of their own. By selecting a widget and selecting Do More from the upper right of the widget a user can open a query window and create or alter the content:

An interesting feature of this tool is the ability to integrate “External Data” – the ability to import external data and map it to existing report structures. This could allow Finance users, for example, to bring in budget or forecast data, which is often held off-line in spreadsheets.

Dashboard content can also include URLS, RSS feeds etc. although access to these must be granted through the Cognos Application Firewall. This can be useful to provide real-time feeds along with content that does not change as quickly.

Dashboards appear as any other object in Cognos Connection, so instead of opening a report the users simply open the dashboard. As well, the dashboards can be easily emailed to users as a link.

Dashboards in C10 represent a clear improvement from what was available in C8, providing IBM Cognos users with another powerful presentation choice, particularly in environments with a high demand for consolidated views of data.

No Comments

TDWI Ottawa Chapter Kick-Off

Readers in the Ottawa, Canada area should take note of the formation of a local chapter of The Data Warehousing Institute (TDWI.) The inaugural meeting was held on October 7 at the offices of Coradix Systems (www.coradix.com) and featured a presentation by Guy Michaud, CIO of the City of Ottawa on the city’s Open Data initiative.

More details on the activities of the chapter can be found at the chapter website at TDWI: www.tdwichapters.org/blogs/ottawa/list/home.aspx. The chapter is aiming for quarterly meetings featuring subjects related to BI and Data Warehousing.

A LinkedIn group has also been created at www.tdwiottawa.org

,

No Comments

Quickly Debugging BI Objects with XML Code

Debugging
Creative Commons License photo credit: rcourtie

Often I am asked to quickly diagnose problems in Cognos Report Studio reports or Microsoft SQL Server Integration Packages.  A trick that works equally well for both is to pull the XML code of the BI object.  This works especially well if you are trying to debug behaviour of certain variables or settings and are having trouble following the code within the development environment.  Object settings can be obscure or difficult to find in a GUI development tool but are often plain as day when you see them listed in XML code.

To pull XML code out of a Cognos Report Studio report, simply click “Copy Report to Clipboard” from the “Tools” menu.  Paste the results into a text editor.

To pull XML code out of a Microsoft SQL Server Integration Services package, simply open the .dtsx file with a text editor.

This works best for exploratory purposes and if you are looking for specific objects in the XML code (which can be easily searched).  I do not recommend editing XML code directly unless you are extremely careful and know exactly what you are doing – it is very easy to corrupt a file this way making it unreadable.  Once the problem is diagnosed and understood, changes can be made in the development tools.  But this method of code exploration has saved me much time in problem diagnosis.

, , ,

5 Comments

Show me what didn’t happen…

In the Sherlock Holmes story “Silver Blaze”, Holmes has the following famous exchange with a policeman:

Policeman: “Is there any other point to which you would wish to draw my attention?”
Holmes: “To the curious incident of the dog in the night-time.”
Policeman: “The dog did nothing in the night-time.”
Holmes: “That was the curious incident.”

A common requirement in reporting is a report that shows not only what happened, but what *didn’t* happen. As an example, consider sales of several different products in a number of territories to a number of customers. A specification for a report may require it to show entries for sales to customers in a particular territory, for *all* products – even for products for which there were no sales.

To end users this is the most normal request in the world – as a sales manager, I want to see a line on a report that shows “0” for the products I didn’t sell.

From a reporting perspective however this can sometimes be a little tricky, especially in a non-OLAP, relational reporting environment. As an extreme example, I was once asked to create a report of just this type, in an organization that in theory had over 250,000 SKUs. Obviously the customer didn’t want a report that included the 1000 products we normally sold and “zero” entries for 240,000 SKUs, but it took a lot of back and forth to figure out just what the customer did expect to see.

Even a seemingly simple example can have unexpected difficulties however. Consider an example where we have 3 customers, each in a different Province, and each capable of buying 3 different products:

In Framework Manager our dimensions are joined to our fact table using Outer Joins. That way we should return columns in out reports for customers or products that had no sales, right?

Here is a snippet of a Report Studio report for sales for these products, from a query called Sales:

It looks like Cust01, Smith Company, bought nothing, and no-one bought any Oranges.

We have sales in Alberta (AB) for CUST02, but all they bought are Prod03 – Pears. We’ve been directed to produce a report filtered on AB that shows ‘0’ entries for the products that weren’t sold.

Since we’ve used outer joins in our model, we should be able to filter the query Sales on AB:

… except that we get this result:

The filter is applied to the result set returned from the server, and the entries for Apples, which were sold in QC, and the entries for Oranges, for which there were no sales (and so the ProvCd is null) are filtered out.

One fairly simple way to get around this is to create a local Product Dimension, and Outer Join that on the result set with a local join. This will attach a list of all products to the result set regardless of what filter is applied to it.

First we create a Product query:

Then we join the Product query to our original, filtered result set, joining on the common ProdNo <-> ProdNo field (which is in the original result set query called Sales.) We use an Outer Join here to ensure that ProductDescription is returned regardless of whether or not there is a corresponding sale of the product in Fact_Sales:

Now we can bring ProdNo and ProductDescription from the query Products into the new result query Products_And_Sales (When we do so they are automatically appended with the number “1” to distinguish them from ProdNo and ProductDescription in Sales):

Now we get a result that shows one entry – with null sales – for products for which there were no sales in Alberta – the ProdNo and ProductDescription will come from our Products query:

It is now a simple matter to turn Sales_Value into a calculated column in the Products_And_Sales query, putting in a “0” value in the event it is null, for example (in a cross-tab this can be done automatically in the cross-tab settings.)

This is a very simple example that can get much more complex – it may be useful to create “join columns” filled with a “1” value, for example, and join on those. Pay careful attention to your Facts – it is possible that for some dimension there is no activity at all.

Finally, another approach, not shown here, is to create an *unfiltered* copy of the original result set with “null values” for all the “fact” columns, and use a Union-join to join it to the original result set.

,

No Comments

The Panic Factor

“Don’t Panic!” – The Hitchhikers Guide to the Galaxy

Panic
Creative Commons License photo credit: scott1723

I often find that when business users are faced with large, even colossal, discrepancies in data sets, their first response is sheer panic.  For me, the larger the discrepancy the calmer I am about it.  Why?  Because discovering source problems in data sets is almost always much easier the larger they are.  The truly bedevilling problems are the inconsistent, small, and seemingly random discrepancies that occur.

What do I find the number one cause of numbers being 5, 10 or even hundreds of times greater than they should be?  A time series snapshot that is summing all time periods.  This is remarkably easy to fix, and can happen easily in a data cube without proper current time period settings.  It is a common mistake for junior report developers, especially when they are expecting to see current period only.

Other common and simple problems include rounding errors or inconsistencies, improper unit of measure calculations, unexpected null values nixing a summarization total, or cube update failures.  Always check for obvious and easy solutions first, and work your way down to more complicated resolutions as necessary.  Note when a problem started and attempt to isolate what has changed since then.  Be methodical and don’t panic!

No Comments

The Dangers of Self-Service BI

Self-Service BI can be a wonderful thing.  Ideally, business users will have a wealth of corporate information at their fingertips and be able to produce meaningful reports quickly and easily themselves, freeing up time for BI developers to work on other meaningful BI initiatives such as scorecarding, data warehousing, data mining – just to mention a few.

The real danger, however, is that Self-Service BI can be an IT driven initiative to reduce workload on itself, often resulting in a product built entirely from an IT perspective with limited input from business since they are often “too busy” to talk about BI.  The end result can at best be confusing and at worst useless to the business user community.

In my experience, the best results in BI are achieved when developers and business users work closely and collaboratively on business and technical issues.  When the developer or data modeler can understand the business being modeled and the business user can understand the basic technicalities of the design, a true win-win can be achieved.  This is not to say that a developer must fully understand the complete business end-to-end, or that the business user must understand each line of code.  But when each can see the solution from the other’s perspective, an optimal solution is close at hand.

Overall, Self-Service BI can be successful if:

  • The scope is well defined
  • The business is well-defined and understood by developers and modelers
  • Business users are active participants in planning, designing and testing
  • The business user community is well trained in the BI environment’s reporting tools

This last point is particularly essential if business users are expected to create their own reports.

No Comments