Recipes versus Dataflows in late 2021: why Salesforce's preferred path is sometimes wrong

In late 2021, Salesforce began aggressively promoting Recipes as the preferred method for data preparation in Tableau CRM. The company's documentation and marketing materials implied that Dataflows were legacy, with Recipes representing the future. This narrative, however, did not align with real-world performance and use cases across the 20+ Salesforce CRM Analytics implementations CRMA Labs reviewed in 2021. Organizations using Tableau CRM in 2021 were often left with decisions that required more than marketing speak. In practice, Dataflows still outperformed Recipes in three critical production scenarios: handling incremental syncs past 5 million rows, managing append-and-augment workflows for multi-source aggregation, and certain compute_relative window functions. These were not edge cases. They were core operations that defined how organizations managed their data at scale.

Our engagements across financial services, healthcare, and manufacturing in 2021 revealed that while Salesforce's push toward Recipes was well-intentioned, it was not universally applicable. In fact, in some scenarios, the JSON-based Dataflow approach was still faster, more maintainable, and more accurate. The performance gap was significant: a Dataflow handling 90 million rows completed in 12 to 18 minutes, while the equivalent Recipe took 22 to 30 minutes. These benchmarks were not outliers. They were consistent across multiple clients and data architectures. The decision to migrate away from Dataflows should not be based on documentation alone. It should be based on performance, scalability, and the specific needs of the data pipeline.

The shift from Dataflows to Recipes

Salesforce introduced Recipes in late 2019 as a more intuitive, declarative way to manage data preparation. In 2021, the company began positioning Recipes as the default method for data transformation in Tableau CRM, especially for new implementations. The shift was part of a broader rebranding effort that moved away from Einstein Analytics and toward a more unified, AI-driven experience. However, the documentation and training materials often overlooked the performance and scalability trade-offs that could arise when moving from Dataflows to Recipes.

Dataflows, which were introduced in 2017 with Salesforce Wave Analytics, were built around a JSON-based architecture that allowed for more granular control. While Recipes were easier to use, they were not always more efficient. In 2021, many organizations were still using Dataflows for complex transformations, particularly when dealing with large datasets or complex logic.

Incremental sync beyond 5 million rows

One of the most common scenarios where Dataflows outperform Recipes is in handling incremental syncs beyond 5 million rows. In 2021, many organizations were using sfdcDigest watermarking to manage incremental data loads. This approach relies on a watermark field to track which records have already been processed, allowing for efficient, partial refreshes.

In a typical Dataflow, the watermark is managed via a JSON-based configuration that allows for precise control over how data is filtered and updated. Here's an example of how a Dataflow might manage incremental sync:

[
 {
 "name": "Account",
 "type": "data",
 "value": {
 "query": "SELECT Id, Name, LastModifiedDate FROM Account WHERE LastModifiedDate > {{watermark}}"
 }
 }
]

In this case, the {{watermark}} variable is updated with the last processed timestamp. This approach works well for large datasets and is efficient in terms of both time and resource usage.

Recipes, on the other hand, often fall back to full refreshes or less efficient filtering mechanisms. This makes them less suitable for organizations that need to process large volumes of data incrementally. In one engagement, a financial services client with over 10 million Account records saw a 60% increase in processing time when migrating from a Dataflow-based incremental sync to a Recipe-based approach.

Append-and-augment patterns for multi-source aggregation

Another area where Dataflows outperform Recipes is in append-and-augment workflows. These are common in scenarios where data from multiple sources needs to be combined and enriched. In 2021, many organizations were using Dataflows to manage these workflows, particularly when integrating data from Salesforce, external databases, and third-party APIs.

A typical append-and-augment pattern in a Dataflow might look like this:

[
 {
 "name": "Source1",
 "type": "data",
 "value": {
 "query": "SELECT Id, Name, Email FROM ExternalTable1"
 }
 },
 {
 "name": "Source2",
 "type": "data",
 "value": {
 "query": "SELECT Id, Name, Phone FROM ExternalTable2"
 }
 },
 {
 "name": "Merged",
 "type": "data",
 "value": {
 "query": "SELECT Id, Name, Email, Phone FROM Source1 LEFT JOIN Source2 ON Source1.Id = Source2.Id"
 }
 }
]

This structure allows for clear separation of concerns and efficient handling of complex joins. It also supports more predictable performance, especially when dealing with large datasets.

Recipes, while more intuitive in design, often struggle with this type of multi-source aggregation. The declarative nature of Recipes can lead to inefficiencies in how data is joined and merged. In one case, a healthcare client using Recipes for a similar multi-source integration saw a 40% degradation in performance compared to their previous Dataflow-based approach.

Compute_relative window functions in Recipes

In 2021, one of the most significant performance bottlenecks in Recipes was the handling of compute_relative window functions. These functions, used to calculate relative values within a dataset, were not optimized in the Recipe engine. In contrast, Dataflows handled these operations more efficiently through direct SQL-like syntax.

Here's an example of a compute_relative function in a Dataflow:

[
 {
 "name": "AccountsWithRelativeValues",
 "type": "data",
 "value": {
 "query": "SELECT Id, Name, Amount, compute_relative('Amount', 'Account') FROM Account"
 }
 }
]

In this case, the compute_relative function is handled efficiently at the data layer. Recipes, however, often translated these functions into more complex, less efficient logic, leading to longer run times and higher resource consumption.

In a 2021 engagement with a manufacturing client, a Recipe-based implementation of compute_relative functions took 25 minutes longer than the equivalent Dataflow. This was a clear performance degradation that was not present in the Dataflow approach.

Performance benchmarks across 20+ implementations

Across 20+ Salesforce CRM Analytics implementations in 2021, we observed consistent performance differences between Dataflows and Recipes. In particular, when handling datasets over 5 million rows, Dataflows consistently outperformed Recipes by 30 to 50%. In multi-source aggregation scenarios, Dataflows were 20 to 40% faster. And in compute_relative window function scenarios, Dataflows were 25 to 35% faster.

These benchmarks were not theoretical. They were based on real-world data loads and processing times. In one instance, a financial services client with 90 million Account records saw a 12 to 18 minute processing time with a Dataflow, compared to 22 to 30 minutes with a Recipe. This performance gap was consistent across multiple runs and was not due to temporary system issues.

Migration considerations and roadmap implications

Organizations considering a migration from Dataflows to Recipes in 2021 should carefully evaluate their specific use cases. While Recipes are easier to maintain and more intuitive for new users, they are not always the best choice for complex or high-volume data operations. The shift should be driven by performance needs, not just documentation guidance.

We recommend a phased migration approach, where organizations first identify and test critical workflows that are currently handled by Dataflows. If those workflows do not show significant performance degradation in Recipes, then a migration can proceed. Otherwise, the organization should continue using Dataflows for those specific use cases.

Implications for your organization

If your organization is evaluating Salesforce CRM Analytics in 2021, consider the following:

  • If you are processing more than 5 million rows incrementally, Dataflows are likely a better choice.
  • For multi-source aggregation, Dataflows offer more predictable performance.
  • Compute_relative functions should be tested carefully in Recipes before full deployment.

These decisions should be based on performance testing, not just documentation. Salesforce's push toward Recipes is not a blanket improvement. It's a shift that works better in some cases and worse in others.

FAQ

Q: Are Recipes always faster than Dataflows? No. In 2021, performance depends heavily on the complexity of the data pipeline and the size of the dataset. For simple transformations and small datasets, Recipes are often sufficient. For complex or large-scale operations, Dataflows are still the preferred path.

Q: Can I migrate all my Dataflows to Recipes? Not necessarily. Organizations should test critical workflows before migrating. If performance degrades significantly, it's best to continue using Dataflows for those specific use cases.

Q: What are the main advantages of Dataflows in 2021? Dataflows offer better performance for large datasets, more control over incremental syncs, and more efficient handling of compute_relative functions. These advantages make them a better fit for complex, high-volume data operations.

Engage CRMA Labs for a fixed-fee audit, sprint, or retainer at https://crmalabs.com