From single node technology to massively parallel processing: how QPR developed a process mining application powered by Snowflake

This is the third blog in our powered by Snowflake blog series. In this blog post, we set out to explore a brief history of data lakes and the Snowflake Data Cloud to understand how the technology works and how a customer can connect their process mining account to Snowflake.

Previous blogs:

Are data warehouses, data lakes, and process mining everyday terms for you? Feel free to skip straight to the section on Introducing the Data Cloud.

Let us take a trip down memory lane to the 1980s; a time before data clouds, data lakes, and even data warehouses. In the late 1980s, the concept of data warehousing appeared from the need to provide an architectural model that would enable data to flow from many different operational systems to decision support environments. The goal was clear: to break down existing data silos and there through improve efficiency while lowering costs.

However, as time went by, data sets grew bigger and the role of semi-structured data in data projects magnified. Data warehouses, unable to process this form of data (as each source of data required individual schemas) and the quantity thereof, were not able to meet the needs of large enterprises. These enterprises were consequently again finding themselves in a siloed data environment. Fast forward thirty years and data lakes emerged. Data lakes are data repositories created for massive amounts of raw data that can be stored in its native form, all in one location.

Over ten years have passed since, with countless failed on-premise data lake projects. This being said, the need for a scalable solution for storing and managing data did not and has not disappeared – quite the opposite in fact. The realization of the endless business opportunities that data can provide if managed well continues to push the data analytics market, with spending on big data and business analytics (BDA) solutions seeing a 10.1% increase from 2020 to 2021 alone.

At around the same time as data lakes saw the light of day, we at QPR also began to grasp the extent to which clients’ existing data could provide possibilities for understanding, monitoring, and improving business processes. Our experts in business process modeling often recalled how troublesome it had been for the customers to create their own accurate process models in an efficient and automated manner.

QPR ProcessAnalyzer (PA), QPR’s process mining solution was introduced to the market as a way for enterprises to gain objective insight into their processes with pinpoint accuracy and reach their full operational potential.

Process mining is a technique to discover, analyze, and monitor processes. When employees or software robots interact with IT systems – such as SAP, Salesforce, or Oracle – the activities leave a trace of data behind, referred to as an event log. Process mining software takes the data that already exists in these information systems and uses it to visualize the real-life execution of the business processes together with other insights drawn from the event logs.

Instead of spending countless hours arguing over processes, their cost-saving areas, fragmented or unclear reporting, and lack of visibility into where efficiency could be improved, process mining users get access to automatically-generated dynamic flowcharts of processes, their performance, and compliance.

Introducing the Data Cloud

You may have been left wondering why many on-premise data lake projects failed. To this question, there are as many answers as there are projects. However, one large influential factor was the core technology upon which most traditional data lakes are built, the Apache Hadoop ecosystem. Hadoop was essential for data lakes of the time, as it includes necessary data platform components such as HDFS which allows the native state of stored data. However, the architectural design required heavy system management and custom coding for data transformation and integration. As such, without a fleet of Java marines, many traditional data lakes just became pools for ETL offload. The rate of failure grew so large that failed data lakes were even given their own name: data swamps.

Spurring in the background, the technological landscape experienced an exponential development of cloud environments. The cloud environment provided new opportunities for data lake architecture with near-unlimited capacity and scalability for the storage and computing of data. While some data lake providers decided to copy their data lake solution onto the cloud, Snowflake took a different approach and decided to build a data solution specifically for the cloud environment with a brand-new SQL query engine and innovative architecture. The Snowflake Data Cloud, a cloud data warehouse-as-a-service (DWaaS), was born.

Dreams of process mining with unlimited scalability

Before going into more detail, let’s jump back a few steps again and try to understand how this all relates to process mining. Back when process mining was gaining foothold as a methodology, solutions such as Snowflake were only an idea. When process mining software was developed, it was built according to the available, industry-standard database technology. The limitation of this technology is rooted in that individual queries are run through a single node. In practice, this implies that one query is run by one computer. You can run parallel queries on parallel computers and ramp up the power of those computers, but at the end of the day, it will always be one computer and the limited capability thereof.

A few years ago, these limitations sparked a thought in our product development team. They had heard rumors of an emerging technology where rather than each node running individual queries, nodes belong to clusters in which each node in the cluster stores a portion of the entire data set locally. A thought appeared: would it be possible to utilize the MPP (massively parallel processing) technology to run heavy-capacity process mining queries? The idea was crazy to say the least, as no one on the market had successfully been able to utilize this technology for process mining before. Despite this, product development decided to embark on a journey of extensive testing to try it out.

Unique architectural structure

In May 2022 we announced our partnership with Snowflake as the first and only process mining software to run natively on Snowflake. In the process of writing this blog, I called up Olli Vihervuori, Product Manager at QPR, to get some insights into how and why the Snowflake Data Cloud and QPR ProcessAnalyzer are a match made in the clouds.

“The short answer to why we chose Snowflake over other solutions and providers was in its simplicity, performance. This performance is enabled by the unique architecture of Snowflake. Additionally, other factors made our choice easy, such as the ability to write SQL data as well as the easiness of use – you can create and start using Snowflake in a couple of minutes. Furthermore, Snowflake is cloud-based and cloud-based only, and the future is in the cloud.” Vihervuori explains.

Snowflake’s architecture is a combination of the best of shared-disk architecture (SD) and shared-nothing architecture (SN). If you want to read more about the architecture I recommend having a look at Snowflake’s pages. To give you a summary, Snowflake has managed to dissect the idea behind SD’s centralized data storage and combined it with the capabilities of SN’s MPP technology. With building, rather than adapting the solution for the cloud, these qualities have further been boosted by the perks of the cloud environment. The perks are that the data storage can be scaled-out at a near-infinite level and queries can be run in multiple, yet independent, compute clusters. For the end-user this implies processing more data at a faster pace.

The unique architecture consists of three key layers: 1) database storage, 2) query processing, and 3) cloud services. All three layers are deployed and managed on a selected cloud platform. Thus, if you use QPR ProcessAnalyzer as a managed -application your account can be hosted on AWS, GCP, and/or Azure. If your data is already safely stored in one or more of these cloud hosts – great, all you have to do is link your Snowflake account. If you are using QPR ProcessAnalyzer as a connected application, your Snowflake queries will be run in QPR’s cloud environment, hosted on AWS Ireland. Connected, managed by, what’s the difference?

Connected and managed -application

When the decision was made to start developing QPR ProcessAnalyzer Powered by Snowflake, R&Ds bucket of water was not filled with another glass of water, rather, they were handed a whole new bucket. You see, QPR ProcessAnalyzer Powered by Snowflake is not a feature or a module – it’s a whole new product. This means that existing customers must make the choice to switch from one software product to another to run queries on Snowflake.

To the end-user, logging into QPR ProcessAnalyzer as a connected or managed application won’t to the eye make much of a difference. Both have the same user interface, and familiar features are available. There are however slight differences, particularly with regard to data governance.

Connected application

In a connected application model, the PA customer is also a Snowflake customer and needs a Snowflake account. The customer then chooses to enable PA as a connected application, which essentially allows PA to log in to the customer’s Snowflake account. Unlike with other process mining software and vendors, there is no need to copy your data into a separate process mining platform. Furthermore, a Snowflake customer will be able, if they so choose, to connect multiple applications to their account. This eliminates further transfer, modification, and copying of data as the entire environment can be queried with familiar SQL tools. Thus, the customer is guaranteed a single source of truth.

Also, don’t worry about security just yet – here is where things get even more exciting. Another perk of Snowflake is the unique ability to share data with customers and business partners in a secure and selective manner. A customer with a connected application model is in charge of their own data. While QPR maintains the application code, the customer manages the data in their own data platform. Hence, PA will only be able to access the necessary set of information to perform the selected actions in accordance with the customer’s data governance policy. Process mining on Snowflake is the best and easiest way to ensure you're complying with data privacy, security, industry, and government regulations.

Managed application

On the other hand, the managed application model does not require the customer to also be a customer of Snowflake. In this model, data and the governance of data are to an extent handled by QPR, as would be the case with the regular version of QPR ProcessAnalyzer. In order to run queries on Snowflake, the customer selects to load the data onto Snowflake when loading data onto PA. This loads the selected data onto QPR’s multi-tenant Snowflake environment, hosted on AWS Ireland. When a query is run, the data is processed in Snowflake whereafter the results are almost instantly showcased on the customer’s PA dashboard interface. This enables the customer to gain all the benefits of efficient scalability provided by Snowflake. Even the largest companies with complex business processes and billions of data rows can now analyze their processes in the blink of an eye.

I know this all might sound a little too good to be true. That’s why we are eager to show you more in live-action. Watch our webinar ‘Process Mining Powered by Snowflake’ to learn more and experience a real demo of process mining powered by Snowflake or book a demo to chat with us directly. You can also read more about QPR ProcessAnalyzer powered by Snowflake on our comprehensive page.

Thank you to Snowflake Inc. for e-books on the topic that served as a base for this blog. Retrieved from

Written by
Author imageexpand

Melina Weckman

Marketing Specialist

Share Online