“A good app should not collect a lot of data.”
We had this discussion with a customer in response to a specific request. After spending hours trying to find the best solution for them, we decided to go against the grain. Still, we had to find the right balance between uploading all the data from the start and having users request smaller batches of data a few times per hour.
This exercise gave us a good insight into how to handle massive data sets. We’re happy to share our solution with you.
The starting point
Let’s start with an alternative approach, conceived by my colleague Hans Santens. He wrote an interesting article about parsing large data sets with Power Automate that is definitely worth reading. He explains how you can use Power Automate to collect large sets (+2K) and put them into a Power App.
I took the same route during development, but then chose a different approach to continue. In the end, the Power Automate solution became slower, not only because of the amount of data, but also with each additional column added to the query.
We encountered some difficult issues that we wanted to work around:
- Extra short time for talking and receiving to and from PowerAutomate besides the PowerAutomate execution time. Every second counts 😊
- It slowed down with each additional record and column due to the fact that the returned Json had to be typecast per column (in a loop).
- Many lines of code were needed to typecast all the fields.
- It was difficult to work with more complex data types.
- Additional maintenance of the PowerAutomate script was required.
- We still needed to create separate collections that matched the fields in SharePoint for patching purposes.
That’s not to say that my colleague’s approach isn’t worth considering. It’s definitely a correct solution for collecting larger data sets.
The approach I want to focus on here works in Power Apps only and collects +30K records in a single collection. Network speed can change the speed needed to collect the data. I advise you to restart the application during a pause, just to be on the safe side.
How it works
For my solution, I used a loop that gets a maximum of 2K records (the default maximum limit for a collection) for each run of the loop.
At first, I tried to filter the list based on the ID of each item. You need a between or a larger/smaller equivalent, but the ‘>=’ and ‘<=’ will give you a delegation warning. So I continued my search, not willing to take the risk of missing data.
I decided to add my own integer value column per record. All records with an ID < 2000 got 1, the next group of 2K records got 2, and so on. And that’s the key to solving this problem – and at the same time the downsizing of this solution.
With this integer value, you can use ‘=’, which won’t give you a delegation warning. However, you need to calculate the value when you create the record through an additional patch/edit.
You have two options here:
- Look for the last ID, assuming users aren’t adding records at the same time, and set the next ID.
- Create the record and ask for the newly added ID. Add the integer value after creation (extra step to update the record).
Let’s get started
There are two main topics to focus on:
- Ensuring that each record is stored with its correct integer value.
- Retrieving the data using the integer value through loops.
In PowerApps, a collection will retrieve a max of 2K records. By using ‘Collect’ after ‘ClearCollect’, you can easily add more rows to the same collection.
For example:
- 2K customers in London
- ClearCollect(colCustomers, filter(Customers, City=’London’))
- 3K customers in Paris
- Collect(colCustomers, filter(Customers, City=’Paris’))
- Total = 2.5K rows in colCustomers
Now that we know this, we need to find a way to repeat this as many times as we need.
Get the data
To get the data, we need to know the the highest integer value in the list and build a loop from 1 to that number. I have named my column ‘dg’ (DataGrouping).
You’ll notice that I sort on ID (descending) and ask for the first ‘dg’ value, which is the largest one based on the sort. Using ‘Max’ for ‘dg’ also gave a delegation warning.
Set( varLastDG, First( Sort( <SP List>, ID, SortOrder.Descending ) ).dg )
Now build the loop (ForAll). Use ‘Sequence’ to build a list of numbers from 1 to varLastDG.
ForAll( Sequence(varLastDG), Collect( <collection name>, Filter( <SP list>, dg = Value, Department = “R&D”, Active = true ) ) )
The extra filters on ‘Department’ or ‘Active’ can reduce the amount of records returned and speed up the whole process. Not every loop will return exactly 2K records, maybe none at all.
Add your collection to a grid to visualize the result. That’s all you need to collect your data!
Now, you need to make sure that each record gets the correct integer value. You also need to make sure that you’re not adding the same value more than 2K times.
Patch your integer value
For this solution, I’ll patch a new record the standard way. However, this way you can’t be 100% sure what ID the record will get. Therefore, the newly created record is stored in a variable. You can get the new ID from this variable and then patch the record (update) with that value.
Use a patch comment to add or update a record. Store the result of the patch action in a variable and get its ID. For an update, you don’t need to patch the integer value. The record has its integer value when you create the record. The number never changes.
Patch a new record
// Create the record and store it in a variable
Set( varSavedRecord, Patch( <SP List>, Defaults(<SP List>) { Title: txtWPNumber.Text, ....... } ) );
// Get the ID of the record used in patch
Set( varPatchedID, varSavedRecord.ID );
// Calculate your integer value
Set( varNewDG, Int(varPatchedID / 2000) + 1 );
// Patch the integer value in the record (not required when updating a record since it has got its ID when you created the record)
If( varEditState = "New", Patch( <SP List>, LookUp( <SP List>, ID = varPatchedID ), {dg: varNewDG } );
Having this integer value in column ‘dg’ gives us a simple approach to collecting the data.
- We will look for the largest ‘dg’ and build a loop from 1 to that number.
- We will retrieve all records where the ‘dg’ value = the loop value (‘=’ is delegatable)
Conclusion
I mentioned the reduction in patching twice, but I hope the benefits of this approach are clear to you as well:
- No extra tool (PowerAutomate) to maintain.
- Just a few lines of code.
- No typecasting required.
- Collections are the same as the SP list.
Now you have two options to choose from: use PowerAutomate and get all the data in one call, or use the PowerApps approach with a loop.
Have fun exploring! And if you need support, my colleagues and I are happy to help 😊.