Audits

HubSpot CRM Data Cleanup: Duplicates, Properties, and Data Hygiene

Jetstack Team 16 min read
hubspotauditdata qualitycrm cleanupduplicatesdata hygiene

Dirty data is the silent tax on every HubSpot portal. It does not announce itself. It does not trigger error messages. It simply degrades everything it touches — email deliverability, segmentation accuracy, workflow reliability, report trustworthiness, and team confidence in the CRM.

The cost is real. Duplicate contacts inflate your HubSpot subscription fees. Unused properties clutter every form, view, and import mapping. Inconsistent data formats break automations and produce misleading reports. And the longer you wait to address it, the worse it gets.

10-25% Typical Duplicate Rate (Uncleaned)
<3% Target Duplicate Rate
85%+ Target Critical Field Fill Rate
15-25 min Daily Rep Time Wasted on Dirty Data

This guide provides a practical, prioritized approach to HubSpot CRM data cleanup. We cover the three pillars of data hygiene — deduplication, property management, and standardization — along with a framework for maintaining clean data over time.

If you are conducting a broader review, this post is part of our ultimate HubSpot portal audit checklist.

Understanding the True Cost of Dirty CRM Data

Before diving into cleanup tactics, it helps to quantify what dirty data actually costs your organization. This is not an abstract problem — it has measurable financial and operational impact.

AreaSymptomsImpactFix Priority
Duplicate contactsInflated contact count, fragmented engagement historyHigher subscription costs, split attributionHigh
Unused propertiesCluttered forms, confusing import mappingsSlower onboarding, mapping errors during importsMedium
Inconsistent values"US" vs. "USA" vs. "United States" in same fieldBroken segmentation, inaccurate reportsHigh
Missing dataCritical fields below 50% fill rateIncomplete segments, failed personalizationHigh
Stale recordsContacts with no activity in 12+ monthsReduced deliverability, inflated costsMedium
Broken lifecycle stagesCustomers still marked "Subscriber"Inaccurate funnel reporting, wrong nurture pathsHigh

Direct Financial Costs

HubSpot’s pricing is contact-based at the Marketing Hub level. Every duplicate contact counts toward your tier limit. Organizations with duplicate rates of 10-20% are potentially paying thousands of dollars annually for records that should not exist.

Beyond subscription costs, dirty data wastes the time of every person who interacts with the CRM:

Sales Impact

15-25 Minutes Per Rep Per Day

Sales reps spend time navigating duplicate records, filling in missing information, and verifying which record is correct before outreach.

Marketing Impact

Segments Built on Incomplete Data

Marketing teams build segments based on incomplete data, reducing campaign effectiveness and wasting ad spend on poorly targeted audiences.

Operations Impact

Workflow Failures From Inconsistency

Operations teams troubleshoot workflow failures caused by inconsistent property values — a workflow triggered by "Lifecycle Stage equals MQL" misses every "mql" or blank entry.

Leadership Impact

Decisions Based on Distorted Reports

Lead counts are inflated, conversion rates appear lower than reality, attribution is fragmented across duplicate records, and customer lifetime value is skewed.

Phase 1: Duplicate Detection and Merging

Deduplication delivers the most immediate, visible impact. Start here.

HubSpot’s Built-In Duplicate Tool

HubSpot provides an AI-powered duplicate detection tool at Contacts > Actions > Manage Duplicates (and the equivalent for companies). The tool suggests pairs of records that likely represent the same person or organization.

1

Review Suggestions in Batches

Set aside 30-60 minutes per session rather than trying to process everything at once. Decision fatigue leads to merge errors.

2

Choose the Primary Record Carefully

When merging, one record becomes the primary (surviving) record. Choose the record with the most complete data, the most engagement history, and the most associated records (deals, tickets, etc.).

3

Understand What Merges and What Doesn't

HubSpot merges contact properties (keeping the most recently updated value), timeline activities, and associations. However, some form submission metadata and workflow enrollment history from the secondary record may be lost.

Beyond the Built-In Tool

HubSpot’s duplicate tool is good for obvious matches but misses many edge cases. Supplement it with these strategies:

  • Email domain matching: Export company records and group by domain — "Acme Corp", "Acme Corporation", and "ACME" with the same domain are duplicates
  • Phone normalization: (555) 123-4567, 555-123-4567, and +15551234567 are the same number — normalize before dedup
  • Cross-object dedup: Check for the same person existing as both a contact and a company (name imported into the company field)

Bulk Merge Strategy

For large-scale deduplication (hundreds or thousands of pairs), manual one-by-one merging is impractical.

Native Option

HubSpot Bulk Merge

Available on Professional and Enterprise tiers. Allows processing multiple duplicate pairs in sequence with HubSpot's built-in matching.

Advanced Option

Workflow-Based Dedup

Data Hub Professional/Enterprise enables custom-coded actions that can automate merge logic for complex matching scenarios.

Third-Party Option

Dedicated Dedup Tools

Tools like Dedupely or Insycle offer more sophisticated matching algorithms, fuzzy matching, and bulk processing capabilities.

Jetstack Option

Data Quality Toolkit

Our marketplace includes automated deduplication solutions that handle complex matching scenarios at scale.

Preventing Future Duplicates

Cleaning up existing duplicates means little if new ones keep appearing.

Duplicate SourcePrevention MethodPriority
Form submissionsConfigure forms to update existing contacts, not always create new onesHigh
CSV importsAlways map the email field and use "Update existing contacts" optionHigh
Integration syncsReview connected apps and verify dedup logic in sync settingsHigh
Manual creationTrain team members to search for existing contacts before creating new onesMedium

Phase 2: Property Audit and Cleanup

Properties are the schema of your CRM. Over time, they accumulate like digital clutter — created for one-off campaigns, by departed team members, or by integrations that are no longer active.

Exporting Your Property Inventory

Start with a complete export of all properties across all objects:

  • Go to Settings > Properties
  • Select each object type (Contacts, Companies, Deals, Tickets, Custom Objects)
  • Export the full list including: property name, internal name, type, group, creation date, and created by

Identifying Unused Properties

A property is “unused” if it meets one or more of these criteria:

Unused CriteriaWhat It MeansAction
Zero fill rateNo records have a value — created but never populatedDelete after dependency check
Stale dataValues not updated in 6+ months, no longer accurateClear values or repopulate
No downstream usageNot referenced in workflows, lists, reports, forms, or viewsDeprecate, then delete
Duplicate purposeAnother property captures the same information more reliablyConsolidate into primary property

Safe Property Deletion Process

🚨
Property Deletion Is Irreversible

Deleting properties in HubSpot is permanent — the property and all its data are removed forever. Follow the safe deletion process below to avoid accidental data loss.

1

Document the Property

Record its name, purpose, fill rate, and any known dependencies.

2

Check Dependencies

Search for the property name in workflows, lists, reports, forms, views, and calculated properties. HubSpot's "Used in" feature helps, but it does not catch every reference.

3

Export the Data

If the property has any populated values, export the data as a CSV backup before deletion.

4

Clear and Wait 30 Days

If unsure, clear values from all records (making it effectively unused) and wait 30 days. If nobody reports missing data, it is safe to delete.

5

Delete in Small Batches

Process deletions 10-20 properties at a time rather than mass-deleting hundreds at once. This makes rollback easier if something breaks.

Property Naming and Organization

While auditing, improve property organization:

  • Group related properties — cluster fields by purpose (e.g., "Marketing Attribution", "Billing Information", "Onboarding Data")
  • Standardize naming — lowercase with underscores for internal names, title case for display names
  • Add descriptions — every custom property should explain its purpose, expected values, and maintainer
  • Deprecation labels — prefix display names with "[DEPRECATED]" for properties being phased out

Phase 3: Data Standardization

Even clean, deduplicated data with well-organized properties can undermine your CRM if values are inconsistent. Standardization ensures that the same concept is represented the same way everywhere.

Common Standardization Targets

FieldCommon ProblemSolution
Lifecycle stagesContacts stuck in early stages; inconsistent status valuesAudit progression rules; standardize values
Country/state"United States", "US", "USA", "U.S.A." in same fieldUse dropdown properties instead of free-text
Phone numbers(555) 123-4567 vs. +15551234567 vs. 5551234567Standardize to E.164 international format
Company names"Acme Corp" vs. "Acme Corporation" vs. "ACME Inc."Establish canonical format; decide on suffixes
Job titlesHundreds of variations for the same roleCreate "Role Category" dropdown that maps titles
Dirty Portal
  • "United States", "US", "USA", "america" in country field
  • Phone stored as (555)1234567, 555-123-4567, +1 555 123 4567
  • 50% of contacts missing lifecycle stage
  • Long-time customers still marked "Subscriber"
  • 300+ custom properties with <5% fill rates
Clean Portal
  • Country field uses standardized dropdown values
  • All phones in E.164 format: +15551234567
  • 95%+ lifecycle stage fill rate with validated progression
  • Lifecycle stages reflect actual customer journey
  • Lean property set — every field has purpose and >50% fill rate

Automation-Assisted Standardization

For large databases, manual standardization is impractical. Use these HubSpot features:

  • Workflows with formatting actions — Data Hub "Format data" actions capitalize names, trim whitespace, convert values
  • Calculated properties — normalize values from source properties automatically
  • Import mapping — map values to standardized options during the import step

Phase 4: Import Cleanup and Governance

Imports are a major source of data quality issues. Every CSV import is an opportunity to introduce duplicates, inconsistent formatting, and incomplete records.

Pre-Import Checklist

  • Deduplication within the file — remove duplicates from the CSV itself before uploading
  • Column mapping — every column maps to the correct HubSpot property with the correct type
  • Value formatting — pre-format values to match standardization rules (E.164 phones, YYYY-MM-DD dates)
  • Required field completeness — every record has values for defined critical fields
  • Update vs. create behavior — decide whether to create new records, update existing, or both

Import Governance Policy

📋

Import Governance Rules

Require manager approval for imports exceeding 500+ records. Mandate the pre-import checklist for every import. Log every import with its source, date, record count, and responsible person. Run a post-import duplicate check within 24 hours.

Cleaning Up Historical Imports

If your portal has accumulated data quality issues from past imports, identify the imports that caused the most damage:

  1. Go to Contacts > Import and review the import history
  2. For each import, check the “created” vs. “updated” counts — high creation counts from external lists may indicate duplicate generation
  3. Use the “Original Source” property to segment contacts by import and assess data quality per batch

For organizations planning a migration, import cleanup is especially critical. Dirty data migrated to a new portal is still dirty data. Our guide on what you lose during HubSpot data migration covers this in detail.

Phase 5: Ongoing Maintenance Cadence

Data cleanup is not a one-time project. Without ongoing maintenance, your portal will return to its pre-cleanup state within 6-12 months.

FrequencyTasksTime Required
WeeklyProcess duplicate suggestions, remove hard bounces, spot-check recent contacts15-20 minutes
MonthlyFill-rate report, review new properties, audit recent imports, check sync health1-2 hours
QuarterlyFull property audit, lifecycle stage review, standardization spot-checkHalf day
AnnualComprehensive dedup sweep, full property inventory, data completeness benchmark1-2 days

Weekly Tasks (15-20 minutes)

  • Review and process HubSpot's duplicate suggestions
  • Check for bounced emails and remove hard bounces from active lists
  • Review recent form submissions for data quality issues
  • Spot-check 5-10 recently created contacts for completeness

Monthly Tasks (1-2 hours)

  • Run a fill-rate report on critical properties
  • Review newly created custom properties — were they necessary?
  • Audit recent imports for data quality issues
  • Check integration sync health for data consistency

Quarterly Tasks (Half day)

  • Full property audit: identify and remove unused properties
  • Lifecycle stage accuracy review
  • Data standardization spot-check across key fields
  • Review and update import governance policies

Annual Tasks (1-2 days)

  • Comprehensive deduplication sweep
  • Full property inventory with dependency mapping
  • Data completeness benchmark against prior year
  • Review and update your data quality KPIs

For a detailed framework on audit timing, see our guide on how often you should audit your HubSpot portal.

Measuring Data Quality Over Time

You cannot improve what you do not measure. Establish baseline metrics during your initial cleanup and track them over time.

Key Data Quality Metrics

<3% Target Duplicate Rate
85%+ Target Critical Field Fill Rate
<2% Target Email Bounce Rate
MetricDefinitionTarget
Duplicate rate% of contact records that are duplicatesBelow 3%
Critical field fill rate% of contacts with email, company, lifecycle stage, lead sourceAbove 85%
Property utilization% of custom properties actively populated and referencedAbove 70%
Bounce rate% of email addresses that hard bounceBelow 2%
Lifecycle stage accuracy% of contacts in correct stage (manual sampling)Above 90%

Building a Data Quality Dashboard

1

Contacts Created Per Month

Track creation volume with source breakdown to identify unexpected spikes from imports or integrations.

2

Duplicate Merge Volume Per Month

Monitor how many duplicates are being resolved — declining trends mean your prevention efforts are working.

3

Critical Field Fill Rates

Use calculated properties to generate fill rate percentages and track improvement over time.

4

Bounced Contact Trends

Track hard bounce accumulation — rising trends signal list hygiene problems or bad data sources.

5

Lifecycle Stage Distribution

Monitor stage distribution over time. Healthy portals show contacts progressing; unhealthy ones show contacts accumulating in early stages.

Review your data quality dashboard monthly. Improving trends confirm your hygiene cadence is working. Declining trends signal that new data sources or processes are introducing problems.

When to Bring in Professional Help

Some data quality issues are straightforward to address internally. Others require specialized expertise or tooling that goes beyond what HubSpot provides natively.

ℹ️
Consider Professional Assistance When

Your portal has more than 100,000 contacts requiring sophisticated matching algorithms, multiple integrations are creating conflicting data, you are preparing for a portal consolidation, or your team lacks the bandwidth to execute cleanup while maintaining day-to-day operations.

Jetstack’s data quality and audit solutions are designed for exactly these scenarios. We combine automated scanning with expert review to clean, standardize, and maintain your HubSpot data. Explore our implementation services for large-scale cleanup projects, or contact us to discuss your specific situation.

Frequently Asked Questions

How many duplicates is “normal” in a HubSpot portal?

Portals that have never been cleaned typically have a duplicate rate between 10-25%, depending on the number of data sources and import volume. After an initial cleanup, a well-maintained portal should keep duplicate rates below 3-5%. Any rate above 10% is costing you meaningful money in inflated subscription tiers and wasted team time.

Will merging duplicates in HubSpot lose any data?

When merging two contact records, HubSpot retains property values from the primary (surviving) record for any fields where both records have data. Timeline activities from both records are combined. However, some metadata — like individual form submission details and certain workflow enrollment data from the secondary record — may not transfer completely. Always designate the more complete and recently active record as the primary.

How do I find unused properties in HubSpot?

Go to Settings > Properties, select the object type, and look for properties with “0 records” or very low record counts in the fill-rate indicators. For a more thorough analysis, export all properties and cross-reference against your workflows, lists, reports, and forms to identify properties that have data but are not used in any downstream process. HubSpot’s property settings show a “Used in” indicator, but it does not capture every reference.

Can I undo a property deletion in HubSpot?

No. Property deletion in HubSpot is permanent and irreversible. Once a property is deleted, both the property definition and all associated data are gone. This is why we strongly recommend exporting property data before deletion and using the “clear and wait” approach for properties you are unsure about.

What is the best tool for HubSpot data cleanup?

HubSpot’s native tools handle basic deduplication and property management. For more advanced needs, third-party tools like Insycle, Dedupely, and Data Quality Command Center offer features like fuzzy matching, bulk standardization, and automated hygiene rules. Jetstack’s marketplace also includes purpose-built data quality toolkits for HubSpot that combine automated scanning with guided remediation.

Should I clean data before or after a HubSpot migration?

Before. Always before. Migrating dirty data to a new portal just transfers the problem — and can make it worse if the migration process creates additional duplicates. A pre-migration audit that includes thorough data cleanup typically saves 20-40% of post-migration remediation effort and cost.

Ready when you are

Less busywork. More delivery, everywhere.

See how JetStack AI turns weeks of manual ops into minutes.
Book a demo now. No commitment, no sales pitch.

Free trial
Set up in under 5 minutes
Works with your existing portal