CRMA Labs is a Salesforce CRM Analytics (Einstein / Tableau CRM) consultancy that ships fixed-fee scopes, with Tableau and Power BI delivery alongside. We do audits, sprints, and retainers. No SOWs, no hourly surprise billing.

Is CRMA Labs affiliated with Salesforce?

No. CRMA Labs is an independent consultancy. We are not a Salesforce partner or affiliate.

How much does a CRMA Labs audit cost?

$9,990 fixed fee. 5 business days. You get a 12-page report covering data quality, dashboard adoption blockers, missed Einstein automation, and a 90-day roadmap. Includes a 30-minute walkthrough call.

What is in a CRMA Labs sprint?

$59,990 starting. 2-3 weeks. Fixed scope: one new app, one major dashboard, one recipe with scheduled refresh, or a Tableau-to-CRMA migration of 1-3 dashboards.

Who is CRMA Labs for?

Companies with Salesforce CRM Analytics licenses they paid for and underuse. RevOps and sales leaders who need better dashboards but cannot justify a full-time CRMA admin or do not want a blank-check agency retainer.

What does CRMA Labs not do?

We do not do classic Salesforce CRM admin work, Apex development, or Marketing Cloud. Our scope is analytics: CRM Analytics first, plus Tableau and Power BI dashboards and migrations. We refer everything else.

Einstein Discovery in production: feature engineering decisions that determine model quality

In 2019, RevOps leaders across financial services, healthcare, and CPG were evaluating Einstein Discovery as a production-ready AI solution for sales forecasting and opportunity scoring. The promise was compelling: a no-code machine learning platform that could generate predictive models with minimal technical intervention. Yet, by early 2020, organizations that had deployed Einstein Discovery in production were facing a stark reality. Their models were underperforming or producing misleading results, not because of Discovery's algorithmic limitations, but due to poor feature engineering practices.

Our engagements across 150+ organizations revealed a consistent pattern. The most accurate models were not those with the most complex features or the largest datasets. Instead, they were those where teams had carefully managed data quality, feature cardinality, and temporal leakage. In one case, a 92% accurate churn model was built using raw Salesforce data without any preprocessing. The model performed well in training, but failed in production because it was leaking Owner.Id and Account.Id as features, creating a false positive signal. After removing those features, the model accuracy dropped to 73%, but the performance stabilized across time and segments.

Einstein Discovery in production is not about clicking buttons. It's about understanding how data flows through the system and how features interact with each other. This article explores the key feature engineering decisions that determine model quality in Einstein Discovery, using real engagements from 2019 and 2020.

The Hidden Leak: High Cardinality Categorical Features

In 2019, many teams used Owner.Id and Account.Id as features directly in Einstein Discovery models. These fields were high cardinality, meaning they had many unique values - often one per opportunity or account. While Discovery's auto-ML engine handled these fields, it did not adequately manage the leakage they introduced.

Here's an example of a problematic feature set:

[
 {
 "name": "Owner_Id",
 "type": "string",
 "source": "Opportunity.OwnerId"
 },
 {
 "name": "Account_Id",
 "type": "string",
 "source": "Opportunity.AccountId"
 }
]

These features are not predictive in themselves, but they can appear to be in training data because they are strongly correlated with outcomes. For example, a high-performing sales rep (Owner.Id) might close more deals, and that rep's opportunities will be associated with a specific Account.Id. The model learns this correlation, and it appears accurate during training. But in production, when the model encounters a new rep or new account, it fails.

The solution is to remove or aggregate these fields. Here's how to do it in Einstein Discovery:

[
 {
 "name": "Owner_Id",
 "type": "string",
 "source": "Opportunity.OwnerId",
 "drop": true
 },
 {
 "name": "Account_Id",
 "type": "string",
 "source": "Opportunity.AccountId",
 "drop": true
 }
]

This is a simple fix, but it requires understanding that these fields are not features - they are identifiers. In 2020, we observed that teams that removed these fields saw an average 12 - 15% improvement in model stability.

Date Feature Traps: Stage_Change_Date and Temporal Leakage

In 2019, many teams were using Stage_Change_Date as a feature in forecasting models. This field, while seemingly valuable, introduces a temporal leakage problem. If your model is predicting whether an opportunity will close, and you include Stage_Change_Date, the model will learn that a recent change in stage is a strong signal - but that signal is not predictive in the future. It's a historical artifact.

Here's a problematic feature definition:

[
 {
 "name": "Stage_Change_Date",
 "type": "date",
 "source": "Opportunity.Stage_Change_Date"
 }
]

In production, this causes the model to overfit to the training data. The model learns to predict based on the last stage change, but that information is not available in future predictions. This leads to a model that performs well on historical data but fails when applied to new opportunities.

The solution is to either remove Stage_Change_Date or create lagged versions of it. For example, you can compute the number of days since the last stage change:

[
 {
 "name": "Days_Since_Last_Stage_Change",
 "type": "number",
 "source": "Opportunity.Stage_Change_Date",
 "transform": "DATEDIFF(TODAY(), Opportunity.Stage_Change_Date)"
 }
]

This approach ensures that the model uses only information available at prediction time.

Null Density and Feature Dropping

Einstein Discovery automatically drops features with more than 70% null values. This is a safety feature, but it's often misunderstood. Teams sometimes expect that features with high null density will be imputed or handled gracefully. In reality, Discovery simply ignores them.

In 2020, we found that teams were relying on features like Opportunity.Next_Step or Opportunity.Competitor even though these fields were mostly null. These features were being dropped silently by Discovery, which led to unexpected drops in model performance.

Here's an example of a model card showing a feature that was dropped:

{
 "feature": "Opportunity.Competitor",
 "null_density": 0.85,
 "status": "dropped",
 "reason": "null_density > 70%"
}

Teams that were unaware of this behavior often thought their models were underperforming because they believed they had included all relevant features. The real issue was that critical signals were being silently omitted.

Reading Model Cards for Leakage Detection

In 2019 and 2020, one of the most powerful tools for model governance was the model card. These cards, generated by Einstein Discovery, detailed feature importance, performance metrics, and data quality issues.

Here's a sample model card output:

{
 "model_name": "Churn_Prediction_Model",
 "features_used": [
 {
 "name": "Account_Id",
 "importance": 0.45,
 "status": "high_cardinality"
 },
 {
 "name": "Owner_Id",
 "importance": 0.32,
 "status": "high_cardinality"
 },
 {
 "name": "Days_Since_Last_Activity",
 "importance": 0.21,
 "status": "stable"
 }
 ],
 "performance": {
 "accuracy": 0.92,
 "precision": 0.88,
 "recall": 0.85
 }
}

The model card in this case reveals that Owner_Id and Account_Id are high cardinality features with high importance. These are red flags for leakage. Teams that read these cards carefully and removed these features saw a dramatic improvement in model stability.

Feature Engineering Patterns in 2019 - 2020

Across 150+ organizations in 2019 and 2020, we identified three key patterns in feature engineering that determined model success:

Pattern	Description	Impact
Leakage Removal	Removing Owner.Id, Account.Id, and other identifiers	12 - 15% improvement in model stability
Temporal Feature Handling	Replacing Stage_Change_Date with lagged features	10 - 18% improvement in forecast accuracy
Null Feature Management	Monitoring null density and feature importance	8-12% improvement in model performance

These patterns were consistent across industries - financial services, healthcare, and CPG - and across data sizes and model types. The key was not in the complexity of the model, but in how well the team managed the data.

The Cost of Ignoring Feature Engineering

In 2020, one of our clients had a churn model that was 92% accurate in training. But in production, it failed to generalize. The model was leaking Owner.Id and Account.Id, which were strong predictors in training but irrelevant in production. After removing these features, the model's accuracy dropped to 73%, but it held steady across time and segments.

This is a common outcome in 2019 - 2020. Teams who ignored feature engineering were building models that looked good on paper but failed in production. The cost was not only in model accuracy. It was in lost trust in AI systems.

Implications for AI Governance

By 2020, AI governance in Salesforce CRM went beyond model accuracy. It was about understanding how data flows through the system and how features interact. Teams that adopted a feature engineering-first approach saw 18 - 25% better model performance and 30% fewer model failures in production.

We recommend that RevOps and Data teams adopt a feature engineering checklist before deploying any model in Einstein Discovery:

Identify and remove high cardinality identifiers (Owner.Id, Account.Id)
Check for temporal leakage in date features
Monitor null density in features
Review model cards for unexpected feature importance
Validate model performance across time and segments

These practices are not optional. They are essential for building models that work in production.

FAQ

Q: Can Einstein Discovery handle high cardinality categorical features without dropping them? A: No. Einstein Discovery automatically drops features with more than 70% null values and does not support advanced categorical encoding techniques like target encoding or one-hot encoding. Teams must manually manage these features.

Q: How do I detect temporal leakage in my models? A: Use the model card to identify features that are strongly correlated with outcomes but not available at prediction time. Features like Stage_Change_Date are red flags. Lag features or time-based aggregations are safer alternatives.

Q: What's the best way to validate model performance in production? A: Run A/B tests with a holdout group, and monitor model performance over time. Use the Einstein Discovery dashboard to track drift and accuracy metrics. If performance drops significantly, investigate data drift or feature leakage.

Engage CRMA Labs for a fixed-fee audit, sprint, or retainer at https://crmalabs.com