Problem
There are several articles mentioning performance problems with IF formula. But until you hit an issue you would not believe how big such an impact could be. I would like to spell out several examples from my performance tuning exercise from last week. I’ll explain only the first example in more detail. Take the others more as an inspiration.
IF method Use Cases
Recently I made some SSAS Tabular based solution and implemented some DAX Measures which would have quite complicated logic for underwriting reporting. Measures for calculating with averages, ratios, medians, development factors (product of values) etc. As the customer wanted to make the solution robust, there was a lot of logic implemented into calculations which would assure a certain behavior of the measure in special situations. For instance, for one measure for current reporting year the value should be always 100%, then you would go from this anchor into the past and subtract some value from the previous year from this number. I learned a lot during the developing of this solution about what the context means for DAX and how to manipulate it. Nevertheless, I needed to make some change to this solution recently which complicated the logic of the measures even more. And then suddenly the solution stopped performing. The root of most problems was using IF method on various places. I’ll show several examples and a workaround to avoid IF method and replace it with something else.
Measure 1 – Empty value handling
For calculating Development Factor values in years, we need to avoid empty values. If there was an empty value, we would replace it with 1.
Original measure (6.6 sec in master query):
Development Factor := IF ( ISBLANK ( [Development Factor Base] ) && MAX ( Date[Year Count] ) > 1, 1, [Development Factor Base] ) |
New Measure (2.5 sec in master query):
Development Factor := [Development Factor Base] + IF ( ISBLANK ( [Development Factor Base] ) && MAX ( Date[Year Count] ) > 1, 1, BLANK () ) |
In this case, we can rewrite the measure because Blank Value handling during math calculations – comprehensive description here. So why is the second formula performing better? If I would simplify my outcome of studying exec plan:
- In the first case the engine considers [Development Factor Base] which is used twice in IF as two separate evaluations independent of each other. (I’m using 2014 engine so IF method uses Strict Valuation.) The reason for this is the quite innocent-looking second part of condition “MAX ( Date[Year Count] ) > 1“.
Compiler is smart and on the background actually enhances a filter of [Development Factor Base] measure with Date[Year Count] Filter. - In second syntax, engine considers [Development Factor Base], although used twice as well, as the same. And therefore, it can use a form of caching.
It might be more clear if you see two queries going into Storage Engine:
SE Query 1 (coming from IF condition)
SELECT Date[Year Count] , SUM ( Facts[AMOUNT] )
FROM Facts LEFT OUTER JOIN Date ON Facts[Year Count Key]=Date[Year Count Key] WHERE Date[Year Count] IN ( 4, 5, 6, 7, 8, 9, 10, 11, 12, 13..[14 total values, not all displayed] ) ; |
SE Query 2 (coming from IF result)
SELECT Date[Year Count] , SUM ( Facts[AMOUNT] )
FROM Facts LEFT OUTER JOIN Date ON Facts[Year Count Key]=Date[Year Count Key]; |
For second version of the measure – second SE Query is issued twice.
Measure 2 – Avoiding IF as such
In this measure, I needed to replace empty values of my [Lost Ratio Base] calculation by 0. But I needed to do it only when other measure had a value. If there is empty measure [Premiums], [Lost Ratio] should stay empty as well.
First version of the measure (12 sec in master query):
Lost Ratio := IF ( ISBLANK ( [Lost Ratio Base] ) && [Premiums] > 0, 0, [Lost Ratio Base] ) |
Second version – no if at all (1.5 sec in master query):
Lost Ratio := [Premiums] + [Lost Ratio Base] – [Premiums] |
The best approach is not having IF at all so if you can avoid it please avoid it. Again, logic of new query uses Blank Value handling during math calculations – comprehensive description here.
Measure 3 – Median Helper
As I worked on 2014 SSAS, DAX did not yet support Median method, so I had to implement one of the workaround approaches which can be found around. I used one from Gerhard Brueckl.
As a helper for median, there is the following measure (7.5 sec in master query):
MedTopCcount := IF ( MOD ( [MedNonEmptyCnt], 2 ) = 0, ( [MedNonEmptyCnt] / 2 ) + 1, ( [MedNonEmptyCnt] + 1 ) / 2 ) |
New version of measure (0.35 sec in master query):
MedTopCount := INT ( [MedNonEmptyCnt] / 2 ) + 1 |
This approach just uses different math to get the same result. Only reason here is to avoid IF, result of the measures is the same and you will get much better performance.
Summary
You can see result of my tuning visually on following chart.
I did not propose definitive solution of performance problem around IF function. However, if you think carefully maybe you would find some way to write your DAX expression differently. As a payoff you can gain huge performance improvement. The point here is to rethink your query not just from DAX perspective but from logic which you use in the query as such.
DAX Code formatted with http://www.daxformatter.com.
4 Comments. Leave new
solid post – very creative (and clean) solutions to an annoying problem
Thanks Bill!
Great article, it was a painfull issue that i try to solve and you did help me a lot.
Great thanks. The other common method I use a lot is to replace the statement:
IF(ISBLANK(Value),BLANK(),Expression)
with
VAR val = VALUE
RETURN Expression* DIVIDE(Val,Val)
to get a profit of built-in DIVIDE error handling. Often VALUE is anyway used in expression which makes is even more efficient to bring as variance if it needs to be read anyway.