For me it’s a bit surprising how little information can be found regarding the “Process Add” option which I consider as the best choice for incremental data load to SSAS Tabular. I have described using this feature in this article some years ago. There is a link to other useful articles regarding various implementation of incremental processing with “Process Add”. After some years of usage our model grew and in this blog I would like to take a closer look into the behavior of this processing command regarding memory consumption.
Processing Behavior
Let’s start with a bit of explanation regarding Full processing. As we know when we start SSAS full processing the following things happen:
1 – Before processing. There is existing SSAS model. Which serves Queries.
2 – Processing Start. Once we start new full process, a shadow copy of the model is created and loaded with new data. As this can take hours or even days, the old model still serves all queries.
3 – Processing Finished. When the new model is loaded with data, we have in memory two models: the Old one and the New one.
4 – Model Switch. During a Switch phase:
- New Queries are put on hold to queue
- Already running queries are finished or killed after timeout period.
- New model is then available for queries.
- Old model is removed from memory.
To recap: for the full process with existing model side by side, you need memory of double size of the existing model plus some memory for uncompressed data while data load is in progress.
Now how exactly does “Process Add” work regarding memory? What happens when you process one table and not the whole model?
As explained by Marco Russo here: “Technically, when a ProcessAdd runs, Analysis Services internally creates a new partition, processes the whole new partition, and then merges it to the target partition (the one on which ProcessAdd command has been executed). You can use this same approach by using separate operations, but ProcessAdd can be more optimized for this specific activity.”
However, what does that mean from the perspective of memory consumption? What happens during the ProcessAdd? Does SSAS shadow a copy of the Model? The Table? Or just Partition?
ProcessAdd test
In order to investigate and prove the behavior I designed a simple SSAS Tabular model with several tables. One of them (sales) the fact table had the following three different partitions:
- Partition 1 – 10mil records
- Partition 2 – 90mil records
- Partition 3 – 1200 records
Processing command would just add 100 new rows:
{ "refresh": { "type": "add", "objects": [ { "database": "AW_P_Test", "table": "Internet Sales", "partition": "Internet Sales 1" } ], "overrides": [ { "partitions": [ { "originalObject": { "database": "AW_P_Test", "table": "Internet Sales", "partition": "Internet Sales 1" }, "source": { "query":"SELECT TOP 100 s.ProductKey, s.CustomerKey, ... s.DueDate, s.ShipDate FROM dbo.vFactInternetSales s" } } ] } ] } } |
Memory behavior during process add is similar for all partitions and looks like this:
The data processing itself is done in roughly 15sec rest of the time SSAS is rebuilding structures like hierarchies and relationships.
The difference is in size of memory consumption during the processing of different partitions. Based on perf mon data we can see the following:
Memory min GB | Memory max GB | Delta GB | Duration (min) | |
Part 1 (Middle size) |
2.75 |
3.27 |
0.52 |
01:54 |
Part 2 (Big One) |
2.76 |
4.14 |
1.38 |
01:54 |
Part 3 (Small one) |
2.73 |
3.13 |
0.41 |
01:43 |
As we can see, there is a substantial difference in memory consumption in case we perform ProcessAdd on a big partition or on a small one. This means in the background there is a shadow copy of the partition. However, memory consumption for the small one is definitely higher than we would expect most likely due to rebuilding of the structures for the whole table.
Summary
To lower memory consumption while doing ProcessAdd I would propose keeping one smaller partition open for increments and monitor if size of this partition is not growing too much. This way you can reduce memory footprint instead of having all data in one partition.
In the next blog I will compare ProcessAdd vs Partition Merge regarding duration and memory consumption. Stay tuned!
2 Comments. Leave new
Good Day
I have read your post on the incremental loading, is there a step by step of how to do it in SSAS
Hello,
yes for example here https://www.sqlbi.com/articles/incremental-processing-in-tabular-using-process-add/
Roman