Automatically Generated Focused Aggregations for

Essbase

Dmitry Kryuk <hookmax@gmail.com>

Creation Date: May 20, 2015

Last updated: May 20, 2015

Version: 1.0

Abstract

On The Efficiency

One Dimension in Rows: Focused Agg Efficiency

One Dimension in Rows: Native Agg Efficiency

Multiple Dimension in Rows: Focused Agg Efficiency

Aggregations Based on Transaction Log

Calculating for every cell (bad idea)

Calculating for shared ancestors

Do we need to reinvent the wheel (@ILANCESTORS)?

Compare to the native AGG

Reinventing The Wheel

Members Classification

Constructing the Aggregation

Performance Stats

Abstract

In this series of posts we will consider different options to automate generation of focused aggregations for Essbase and Hyperion Planning.

It is well known that Hyperion Planning provides ability to design focused aggregations and run them on save. This creates real-time view of aggregated data. It also solves many issues related to scheduled calculations, like long maintenance windows, and the need to wait hours until the next calculation to see updated aggregated data.

But in reality the transition from scheduled calculations to focused business rules usually has many roadblocks. You need to use Smartview, Planning, and redesign your calculations. Most importantly, you need to change your business processes. What if organization is already using essbase extensively, and has thousands of excel templates? Now you need to convince users to switch from excel add-in to Smartview, convert their templates to predefined planning forms, and limit their ability to report and submit data. This can be an uphill battle with many casualties and tiny chances to win.

lets imagine for a moment that you are lucky enough to be on a new project which implements Planning and you design your focused Business Rules by the book.

It is trivial to develop focused aggregations for 3 or 4 dimensions, but aggregations become more complex for larger number of dimensions. True, it is the same algorithm regardless of the number of dimensions. But, if you write your focused calculation for say 6 dimensions (although it is usually unrealistic number), the script itself becomes large, and you can make an error going through all aggregation combinations, especially when you have multiple dimensions in rows/columns.

Workarounds like using user variables for rows/columns, require additional maintenance, development effort, and from the user - defining those variables.

Another aspect is testing. Considering the fact that requirements and forms layout change multiple times throughout life cycle of a standard project, focused aggregations redesign and testing could consume significant portion of the budget.

Also, are focused Business Rules really give us best possible performance? Can we optimize aggregations even more? When we have one or more dimensions in the rows, it is unlikely that all rows are being changed every time the form is submitted. Sometimes forms have hundreds or thousands of rows simply because they inherited their design from Excel Add-in templates. If only 1% of rows has actually changed, and if we aggregate for 100% of the rows, the aggregation is not really a focused or efficient, is it?

So we have 3 issues on our hands:

Users are not willing to use Planning forms or to switch from excel add-in to Smartview
Implementation budget for planning, and focused BRs is much higher (due to development costs, but also because of licensing fees and infrastructure footprint)
Even if we do implement focused BRs, they do not guarantee optimal performance, if users keep the structure of the old excel templates.

So, can we run efficient focused calculations, let users work the way they are used to, and do this for a fraction of the budget considering the alternatives? This is what we try to explore in this post. And first we need to dive into technical stuff. If, however, you want to go directly to the description of the method that constructs focused aggregations automatically, you can jump to Reinventing The Wheel section.

On The Efficiency

One Dimension in Rows: Focused Agg Efficiency

An example of suboptimal approach to focused aggregation is a use of a variable to specify a member, descendents of which are displayed in the form. Then descendents and ancestors of that member are aggregated in calculation.

Lets consider the outline below, and use a member Dim1_11 as a value for user variable.

Descendents of Dim1_11, members Dim1_111 and Dim1_112 are displayed in the form as rows. For simplicity assume Dim2 has a similar hierarchy structure, and its level 0 members are displayed in a page drop-down. Member Dim2_112 is selected. Something like the form below:

Also assume we have data in 4 level 0 combinations, and we already aggregated the data once. Now we updated Dim1_111->Dim2_112 combination.

The matrix below shows the updated cell, and numbers show the order of calculation.

The following calculation would be used to aggregate a webform. Obviously, specific member names would be substituted with run-time-prompts and variables.

/*---Script BR001---*/

SET MSG DETAIL;

SET UPDATECALC OFF;

SET CLEARUPDATESTATUS OFF;

SET EMPTYMEMBERSETS ON;

SET CACHE DEFAULT;

/*-----Part1---------*/

FIX("DIM2_112")

@IDESCENDANTS("DIM1_11");

@ANCESTORS("DIM1_11");

ENDFIX

/*-----Part2---------*/

FIX(@IDESCENDANTS("DIM1_11"), @ANCESTORS("DIM1_11"))

@ANCESTORS("DIM2_112");

ENDFIX

If only one member was updated on the form (say Dim1_111), we would need to calculate only 15 cells that were impacted by the change:

(3 ancestors of dim1_111 + dim1_111 member itself)x(3 ancestors of dim2_112 + dim2_112 member itself) - changed level-0 combination.

What happened in practice? Below is the output from the calculation.

The output from the first FIX:

Output from Script BR001 - Part1

Calculating [ Dim1(Dim1_111,Dim1_112,Dim1_11,Dim1_1,Dim1)] with fixed members [Dim2(Dim2_112)].

Executing Block - [Dim1_11], [Dim2_112], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_1], [Dim2_112], [Working], [BUDGET], [FY14].

Executing Block - [Dim1], [Dim2_112], [Working], [BUDGET], [FY14].

Total Block Created: [0.0000e+00] Blocks

Sparse Calculations: [3.0000e+00] Writes and [8.0000e+00] Reads

Dense Calculations: [0.0000e+00] Writes and [0.0000e+00] Reads

Sparse Calculations: [3.9000e+01] Cells

Dense Calculations: [0.0000e+00] Cells.

One thing to notice is the number of reads. While executing the block, Essbase reads all child blocks and currently calculated block. So to calculate

Dim1_11 Essbase reads blocks Dim1_111, Dim1_112 and Dim1_11 itself.

Dim1_1: reads blocks Dim1_11, Dim1_1

Dim1: reads blocks Dim1_11, Dim1_12, Dim1.

Hence, we have the total of 8 blocks to read appearing in the output.

The output from the second FIX:

Output from Script BR001 - Part2

Calculating [ Dim2(Dim2_11,Dim2_1,Dim2)] with fixed members [Dim1(Dim1_111, Dim1_112, Dim1_11, Dim1_1, Dim1)].

Executing Block - [Dim1_111], [Dim2_11], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_11], [Dim2_11], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_1], [Dim2_11], [Working], [BUDGET], [FY14].

Executing Block - [Dim1], [Dim2_11], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_111], [Dim2_1], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_11], [Dim2_1], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_1], [Dim2_1], [Working], [BUDGET], [FY14].

Executing Block - [Dim1], [Dim2_1], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_111], [Dim2], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_11], [Dim2], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_1], [Dim2], [Working], [BUDGET], [FY14].

Executing Block - [Dim1], [Dim2], [Working], [BUDGET], [FY14].

Total Block Created: [0.0000e+00] Blocks

Sparse Calculations: [1.5000e+01] Writes and [3.5000e+01] Reads

Dense Calculations: [0.0000e+00] Writes and [0.0000e+00] Reads

Sparse Calculations: [1.5600e+02] Cells

Dense Calculations: [0.0000e+00] Cells.

Total Block Created = 0 since the database was preaggregated. The changed cell had value already before it was changed.

But instead of 15 blocks we calculated 18 blocks. This is because we have 2 rows on a form, and our calculation is designed to calculate all rows. Hence the efficiency of our calculation is 83% . the metric for efficiency would simply be the optimal number of blocks that need calculation divided by the actual calculated number. So The best efficiency is 100%, and the worse it becomes it asymptotically converge to 0.

It may seem like not a big deal - losing 17% of efficiency, but we’ll see later that lost efficiency grows fast.

The generic formula for the lower limit of the number of cells to be calculated:

Cmin=d=1D(kd+|jkdA(mjd)|)-d=1D(kd)

Whereas:

kd is number of updated level-0 members of d-th dimension

mjdis the updated j-th member of d-th dimension

|A(mjd)| - the number of all ancestors of mjd

jkdA(mjd) - union of all ancestor sets of members mjd

In our example:

mj1=d1-111 , mj2=d2-112

jk1A(mj1)=A(m11)={d1-11,d1-1,d1}

jk2A(mj2)=A(m12)={d2-11,d2-1,d2}

|jk1A(mj1)|=|jk2A(mj2)|=3

k1=k2=1

Cmin= 15

One Dimension in Rows: Native Agg Efficiency

lets compare BR001 result to the calc script that uses native AGG function:

/*---Script BR001b---*/

SET MSG DETAIL;

SET UPDATECALC OFF;

SET CLEARUPDATESTATUS OFF;

SET EMPTYMEMBERSETS ON;

SET CACHE DEFAULT;

Agg("Dim1", "Dim2");

And the output:

Output from Script BR001b

Multiple bitmap mode calculator cache memory usage has a limit of [16666] bitmaps..

Aggregating [ Dim1(All members) Dim2(All members)].

Executing Block - [Dim1_11], [Dim2_111], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_1], [Dim2_111], [Working], [BUDGET], [FY14].

Executing Block - [Dim1], [Dim2_111], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_11], [Dim2_112], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_1], [Dim2_112], [Working], [BUDGET], [FY14].

Executing Block - [Dim1], [Dim2_112], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_111], [Dim2_11], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_112], [Dim2_11], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_11], [Dim2_11], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_1], [Dim2_11], [Working], [BUDGET], [FY14].

Executing Block - [Dim1], [Dim2_11], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_111], [Dim2_1], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_112], [Dim2_1], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_11], [Dim2_1], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_1], [Dim2_1], [Working], [BUDGET], [FY14].

Executing Block - [Dim1], [Dim2_1], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_111], [Dim2], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_112], [Dim2], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_11], [Dim2], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_1], [Dim2], [Working], [BUDGET], [FY14].

Executing Block - [Dim1], [Dim2], [Working], [BUDGET], [FY14].

Total Block Created: [0.0000e+00] Blocks

Sparse Calculations: [2.1000e+01] Writes and [4.9000e+01] Reads

Dense Calculations: [0.0000e+00] Writes and [0.0000e+00] Reads

Sparse Calculations: [2.7300e+02] Cells

Dense Calculations: [0.0000e+00] Cells.

Nothing is terribly surprising. Complete aggregation across two dimensions has written and read more blocks than the focused aggregation. Now consider this fact: Our database contained data in 4 level 0 cells (marked in grey). So by running focused aggregation we calculated 50% of the existing data set. And when we used native AGG we calculated complete data set.

What happens if we clear the database and input data into a single cell, the one that is a focus of our focused aggregation (dim1_111-->dim2_112)? Lets rerun both BR001 and BR001b calculations.

Read/Writes for BR001:

Sparse Calculations: [3.0000e+00] Writes and [8.0000e+00] Reads

Sparse Calculations: [1.2000e+01] Writes and [2.8000e+01] Reads

Read/Writes for BR001b:

Sparse Calculations: [1.5000e+01] Writes and [3.0000e+01] Reads

The order of calculated cells is identical for both:

Figure 4.

Well, in this case complete agg is more efficient than focused agg, since it had to read only 30 blocks instead of 36. But we are not comparing apples to apples here, and the reason is this line in the log:

Calculator Cache With Multiple Bitmaps For: [Currency].

In our focused aggregation we notice this in the output:

Calculator Cache: [Disabled].

Why was calculator cache disabled for focused aggregation? Because by default it is enabled only when at least one full sparse dimension is calculated. To override that default we used SET CACHE ALL; statement. To have consistent results we either need to SET CACHE ALL for focused calculations, or SET CACHE OFF for native AGG. If we disable calc cache for native AGG we get the same

Sparse Calculations: [1.5000e+01] Writes and [3.6000e+01] Reads

In further examples we disable calculator cache for consistency.

Multiple Dimension in Rows: Focused Agg Efficiency

What happens if we put 2 dimensions in the rows? Lets say we have 4 combinations of level 0 members in rows:

If only one combination gets updated, say Dim1_111->Dim2_112 we would still need to calculate only 15 cells. Although layout of the form has changed, nothing changed from calculation requirements perspective. Those are the same cells from the previous example.

But since our aggregation needs to take care of 2 dimensions brought into rows, our script is based on descendant and ancestors of Dim1_11 and Dim2_11. This is how it looks like:

/*---Script BR002---*/

FIX(@RELATIVE("DIM2_11",0))

@IDESCENDANTS("DIM1_11");

@ANCESTORS("DIM1_11");

ENDFIX

FIX(@IDESCENDANTS("DIM1_11"), @ANCESTORS("DIM1_11"))

@IDESCENDANTS("DIM2_11");

@ANCESTORS("DIM2_11");

ENDFIX

And this is the output:

Output from Script BR002

Calculating [ Dim1(Dim1_111,Dim1_112,Dim1_11,Dim1_1,Dim1)] with fixed members [Dim2(Dim2_111, Dim2_112)].

Executing Block - [Dim1_11], [Dim2_111], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_1], [Dim2_111], [Working], [BUDGET], [FY14].

Executing Block - [Dim1], [Dim2_111], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_11], [Dim2_112], [Working], [BUDGET], [FY14].

Executing Block - [Dim1_1], [Dim2_112], [Working], [BUDGET], [FY14].

Executing Block - [Dim1], [Dim2_112], [Working], [BUDGET], [FY14].

Sparse Calculations: [6.0000e+00] Writes and [1.6000e+01] Reads

Calculating [ Dim2(Dim2_111,Dim2_112,Dim2_11,Dim2_1,Dim2)] with fixed members [Dim1(Dim1_111, Dim1_112, Dim1_11, Dim1_1, Dim1)].