Archive for June, 2010
I recently read a blog entry that tore a strip off of the notion of the iterative process. The author argued that if data architects did their jobs properly, there would be no need for iterative development since everything would be done properly in the first place. Using building architects as an analogy, he says no one would build a home without a sound architectural design. So why is the same thing not done with data systems to provide better results and more successful projects?
It’s true that the notion of data architecture and building architecture seem very similar on the face of it – they even share the same name. But there are great differences as well. Continuing with the construction analogy, a typical business intelligence project would look like this:
- The client would like a house built. Not sure how many rooms. Not sure how many stories. Not sure about windows or exterior or wiring or plumbing. Not even sure about the site – it is not surveyed. But the client wants a house.
- The client wants the house to be infinitely flexible, to be able to house a single family or any number of families as the case may be.
- The client wants the house tomorrow.
Anyone in construction would balk at such a request – it is patently ridiculous. But these are the realities of virtually every BI project I have been on. And the only way to manage this sort of project is by the iterative process.
Now this is not to say you build a shoddy home and rebuild it repeatedly over and over again (if this is what you are doing, you do not really understand the iterative process). What you need to do is start with a firm and flexible foundation. When problems are uncovered (say the home site is entirely bedrock) you raise the issue with your client and discuss alternatives. Once a foundation is in place, you can start with building levels, seeking feedback from the client at various milestones to ensure the house is to their satisfaction. Properly executed, the iterative process would be building on itself like a many storey house, with each level as piecemeal, and each prior step open to revision as fundamental changes are required (now we are building a home for 4 families – redo the fuse box!).
This is where the analogy breaks down for me. Business intelligence is more like baking without a recipe than like architecture – you need to taste it to see how you are doing. Your client often doesn’t know what they want or what is even possible. It is an exploratory process – each step reveals more information, which calls for further changes or adjustments. In the end your client’s house will look completely different than originally envisioned, but it will do so much more and meet needs they didn’t even know they had.
It is for these reasons that the iterative approach is a fundamental part of the IBM Cognos Solution Implementation Methodology, and many other data warehouse methodologies as well.
I was recently at a meeting where a newly hired consultant stated that “Open Source BI” was the way of the future. This client of mine is a Cognos shop, having spent much time and money on Cognos 8 technologies over the years, and was taken aback by the suggestion. I was left pondering “what is open source BI anyway?” I’ll be the first to admit that I really don’t get it.
Traditionally an “open source” system is one that does not hold its source code as propriety information and will allow others to see it and perhaps modify it, the idea being that the code is held in the collective and not by a single corporation. The intention of such a software licence is to allow the user freedom of choice in future upgrades and to prevent vendor lock-in. So when people start talking about “open source BI” what I picture first is endless reams of C++ or Java source code being thrown at the feet of the business users, code they understand a good deal less than the BI systems the code creates. Such an arrangement assumes that you have computer programmers either as part of a larger collective or in your employ to make necessary or desired code changes, a quantum leap for most enterprises. So is open source BI something else?
Is Open Source BI actually “Freeware/Shareware”?
I have seen many instances of Open Source BI being freely available for basic implementation, with premium apps and services available for a price. I understand that open source certainly does have ramifications for licencing, but it doesn’t seem to me to fit strictly into the freeware model.
Is Open Source BI actually “open platform”?
When Cognos first came out, one of its greatest claim to fame was that is was platform independent. But nowadays any modern BI system can read from any or all database systems with the right configuration. So I don’t see this as an explanation of open source BI either. The ability to read from any database environment and the ability to apply a single data model against any environment is not the same thing, however.
Is Open Source BI “commercially open sourced software” (COSS)?
As I understand it, COSS is a hybrid of free open source and proprietary software. In this case, some code is released open source and some generally enhanced functionality is available through a closed and limited licence. This sounds to me like the closest description to what I have heard described. If so, this licencing arrangement still has many of the pitfalls of the traditional proprietary software licencing, if you are in fact using the advanced features.
Regardless of the licensing arrangement, I don’t see how changing from one vendor to another would ever be easy. Each reporting environment has its own idiosyncrasies, and sometimes one vendor tool will do the same job in a completely different way. Particularly in ETL this would be problematic. ETL is often intensely complex and is very specific to its source and target environments. I am a regular user of both Cognos Data Manager and Microsoft SQL Server Integration Services and I would be loath to attempt converting one to the other. You would need a good and compelling reason to justify the cost and effort in changing vendors, even with an open source licensing arrangement, in my opinion.
I am genuinely interested in knowing what benefits, limitations and drawbacks exist for “open source” BI. I know that a number of my regular readers and Twitter followers work directly in open source business intelligence so I’m hoping that you can help explain this to me. Comments and feedback are most certainly welcome!
The following is ”Cognos Tip of the Week” with Peter Beck:
Microsoft SQL Server can fairly easily return a “table” from a function. The function is declared as returning a TABLE as a data type, the structure of which is declared in the function declaration. You can then pass a parameter to the table, and return the results of the SELECT statement that makes up the body of the function based on the parameter. This can be a powerful way of returning a TABLE based on some user prompt, and hiding the underlying complexity of the query from the users and/or report developers. By creating the function once, in the database, and indexing the underlying tables appropriately, you can create complex logic that can be used throughout your reporting environment without exposing that complexity, or the underlying table objects.
Oracle can return a table from a function as well, but doing so requires a little bit of set-up on both the Oracle database side and on the Cognos Framework Manager side.
In Oracle there are essentially 2 approaches. The first is to create a Stored Procedure that returns an OUT parameter. This parameter must be declared as a REF CURSOR. Showing this is beyond the scope of this article, but a simple example of how to do this is here.
Once created, the Stored Procedure can be brought into Framework Manager as a Query Subject, and it will behave as if it is a Table, but one that can be “parameterized”:
The other approach is to create a Function that returns a Type that is a collection of types that makes up a row of the “table” that will be returned. Once again, the exact details of how to do this are out-of-scope for this article, but a good discussion can be found here.
Once created, this can also be used in Framework Manager as a parameterized object that will return a table. The catch is that for this to work, you must use Native SQL as the SQL type for the query subject:
Why? Because to make the function return a Table, we must used the special Oracle TABLE reserved word when calling the function:
Now the function will behave like a table in FM:
So what is the best approach? A table based function takes a little more work to set up, but is probably more flexible in what it can return – creating a TYPE in Oracle that can be re-used might be of some use. More importantly, the function can be declared as PIPELINED. This can lead to a dramatic increase in performance compared to a non-PIPELINED function, if you have the memory.
Step 1: Verify what you have been told
A user has brought a data problem to your attention. Be sure to verify that everything your user has told you is in fact correct. Your user may not fully understand the data set or may be operating under false assumptions. Make sure you can see the problem data for yourself and that you understand the problem as presented. This will save you hours from investigating non-existent problems that were merely data misinterpretation.
Step 2: Reduce the data set
Focus like a laser beam on the smallest possible set of data that exhibits the data set problem. Ideally you’d like this to be less than 10 if possible, but anything fewer than 100 is easily manageable. This will allow you to speak to your user in concrete terms possibly about a single transaction, and will also allow you to rapidly run ETL or OLAP cubes quickly and repeatedly on this reduced data set.
If your data is in an OLAP cube, pulling a reduced data set can be simply a matter of navigating the hierarchies of a dimension for a small record set. Otherwise, your user may do this for you up front by providing a specific example. Ask for an example from your business user if necessary.
Step 3: Isolate the environment
Where is the problem? In the ETL code? In the OLAP cube delivery? In the report? In a report’s individual query? This will take some investigative work. Be sure to identify the root cause, not merely a symptom. The quicker you can isolate the environment of your problem the sooner you will be able to find the problem source.
Step 4: Find and resolve
The simplest explanation is usually the first place to check. Look for recent changes to an environment. In a report, check prompts, filters and calculations. In ETL, check logic against business rules and ensure results are as expected. In all cases, check query results closely. Individual queries, calculations and other derivatives are a common source of problems.
Step 5: Follow Through
Ensure that you follow through with the user. You may have resolved an issue, but it might not necessarily have been the problem that the user had in mind. Do not assume your user will magically know the problem is fixed. Confirm with the user that the issue is in fact resolved.
If multiple issues exist, this cycle may need to be repeated until all issues are addressed.
Cartesian joins are joins that are the “product” of two sets. A popular join used in Report Studio is the UNION join. This will take the result of one set – the results of, for example, a SELECT statement that yields 5 columns – and “merge” it with the result of a second select statement that has the same number of columns of the same type. Each query might return 10 rows, but the result of the UNION join between them will result in 20 rows (Other Cartesian joins are the EXCEPT, which returns rows that are only in one of the source queries or the other, but not both, and the INTERSECTION, which returns elements that are in both source queries).
Cartesian joins are useful when you need to present several different results within the context of a single query – you can create 3 different queries, and then UNION, INTERSECT or EXCEPT them together and present them in a single List object. As long as they have the same number of columns this should work fine.
In the example below, Query5 is the result of an INTERESCT between Query3 and Query4 – it will select the elements that are common between them.
Cognos 8.3 doesn’t always behave as expected however, and a Report Studio report that that uses UNION joins may result in an unexpected error:
ORA-32035: unreferenced query name define in WITH clause
This can be maddening because the report may work correctly when rendered in HTML, but generate the error when the user requests a PDF.
The source of the error is an inappropriately formed query generated by Cognos, and it can be avoided by restricting the query that is the result of the Cartesian Join from using the WITH clause. The WITH clause allows the database server to perform calculations on sub-queries more efficiently. However, in this case Cognos may not manage the formation of the clause correctly, and using it should be avoided to avoid this error. This setting is found under the Properties of the query, in the setting “Use SQL With Clause”:
Setting this to “No” should allow your report to run correctly.