This article looks at a reporting requirement where we need to restrict data to the latest data, for example, to only show the latest account balance or the latest action item for work streams in a project.
Our solution is to implement a series of predefined conditions one for each dimension that we need to filter for latest data by.
Consider the following table that is displaying a list of accounts, the account type and the account balance for consecutive months.
|Account Number||Account Type||Balance Date||Account Balance|
The business have asked that in our BusinessObjects universe we provide a mechanism to display latest account balance by either account or by account type.
In SQL you would write code similar to,
SELECT account_number, account_balance FROM fact_accounts WHERE balance_date IN (SELECT max(balance_date) as MaxDate FROM fact_accounts fa WHERE fa.account_number = account_number GROUP BY fa.account_number)
The above will display each account and the balance for the latest date â€“ 1 March 2009. Similarly the following query will display account balance but this time by account type,
SELECT account_number, account_balance FROM fact_accounts WHERE account_date IN (SELECT max(account_date) as MaxDate FROM fact_accounts fa WHERE fa.account_number = account_number GROUP BY fa.account_number)
This technique where we have a subquery within the where clause that joins back to the main query is known as a correlated subquery and is supported by the majority of leading database vendors including Oracle, SQL Server, DB2 and My SQL.
More information can be found at wikipedia or from IBM.
Our challenge is now how best to implement this in a universe.
The solution is to use predefined queries and the key part is to know that we need to create a predefined condition for each dimension that we need the latest account balance for. That is in our example above we need a predefined condition that filters for latest account balance by account number and another predefined condition that filters for latest account balance by account type.
It would be ideal if we could just create one predefined query called say ‘Latest Data’ that the users can add to their report query and this then will automatically filter for latest data. However this is not possible because we still need to know in what context do the users mean ‘Latest Data’ for, that is, do they want latest data by account type or by account number.
In the example above we had two contexts (and I don’t mean universe context here) for latest data: namely latest account balance by account number and latest account balance by account type. As such we need to create the following two predefined conditions in our Universe.
|Name||Latest Balance by Account Number|
|Description||Will filter the results to display latest balance for each account|
Balance_Date IN ( SELECT MAX(fab.Balance_Date) AS MaxDate FROM FACT_ACCOUNT_BAL AS fab WHERE (fab.Account_Number = Account_Number) GROUP BY fab.Account_Number )
|Name||Latest Balance by Account Type|
|Description||Will filter the results to display latest balance for each account type|
Balance_Date IN ( SELECT MAX(fab.Balance_Date) AS MaxDate FROM FACT_ACCOUNT_BAL AS fab WHERE (fab.Account_Type = Account_Type) GROUP BY fab.Account_Type )
Then we can then use either of these in our queries.
The screenshot below illustrates a query in Web Intelligence using this predefined condition.
The following is the SQL generated by the above query,
SELECT FACT_ACCOUNT_BAL.Account_Number, FACT_ACCOUNT_BAL.Balance_Date, FACT_ACCOUNT_BAL.Balance FROM FACT_ACCOUNT_BAL WHERE ( Balance_Date IN ( SELECT MAX(fab.Balance_Date) AS MaxDate FROM FACT_ACCOUNT_BAL AS fab WHERE (fab.Account_Number = Account_Number) GROUP BY fab.Account_Number) )
And the screenshot below is of a simple table displaying the results of this query.
Our requirement was to add a mechanism to a BusinessObjects universe that allows users to easily filter a query so that it only displays the latest data. We saw that in order to fulfil this requirement we needed to understand the context of the meaning of “latest data” and we can then create predefined conditions that filter for latest data by each required context.
In the example above Account Balance is a special kind of measure know as “semi-additive”. These are measures that don’t aggregate uniformly across all dimensions. In this case we are logically able to add account balances across different accounts or account types but we can’t add account balances over time.
BusinessObjects has made some progress in the latest release (XI3.1) where we can delegate that a measure is aggregated using a database query rather than by the report. What would be ideal however would be a mechanism to say how a measure is aggregated depending on which dimension (object or maybe class) is used. For example if our dimension is account type then the aggregation is “sum” however if it is by time then it is “max” based on dimension value or maybe “none”.
Semi-additive measures can be difficult to handle and hopefully above highlights one method of handling at least time base semi-additive measures.
3 thoughts on “Filtering for Latest Data in a Universe”
You’re perfectly right when you say that aggregation should depend on the dimension. I’m encountering this problem regularly.
The notion of “analysis axis” which contains dimensions is not there, although hierarchies do exist.
Hopefully there will be some progress in XI4, but I’m not holding my breath.
Hi, thanks for your comment! I’m also eagerly awaiting next release and would love to see much more data analysis features added to Web Intelligence!