Dirty data is the silent tax on every HubSpot portal. It does not announce itself. It does not trigger error messages. It simply degrades everything it touches — email deliverability, segmentation accuracy, workflow reliability, report trustworthiness, and team confidence in the CRM.
The cost is real. Duplicate contacts inflate your HubSpot subscription fees. Unused properties clutter every form, view, and import mapping. Inconsistent data formats break automations and produce misleading reports. And the longer you wait to address it, the worse it gets.
This guide provides a practical, prioritized approach to HubSpot CRM data cleanup. We cover the three pillars of data hygiene — deduplication, property management, and standardization — along with a framework for maintaining clean data over time.
If you are conducting a broader review, this post is part of our ultimate HubSpot portal audit checklist.
Understanding the True Cost of Dirty CRM Data
Before diving into cleanup tactics, it helps to quantify what dirty data actually costs your organization. This is not an abstract problem — it has measurable financial and operational impact.
| Area | Symptoms | Impact | Fix Priority |
|---|---|---|---|
| Duplicate contacts | Inflated contact count, fragmented engagement history | Higher subscription costs, split attribution | High |
| Unused properties | Cluttered forms, confusing import mappings | Slower onboarding, mapping errors during imports | Medium |
| Inconsistent values | "US" vs. "USA" vs. "United States" in same field | Broken segmentation, inaccurate reports | High |
| Missing data | Critical fields below 50% fill rate | Incomplete segments, failed personalization | High |
| Stale records | Contacts with no activity in 12+ months | Reduced deliverability, inflated costs | Medium |
| Broken lifecycle stages | Customers still marked "Subscriber" | Inaccurate funnel reporting, wrong nurture paths | High |
Direct Financial Costs
HubSpot’s pricing is contact-based at the Marketing Hub level. Every duplicate contact counts toward your tier limit. Organizations with duplicate rates of 10-20% are potentially paying thousands of dollars annually for records that should not exist.
Beyond subscription costs, dirty data wastes the time of every person who interacts with the CRM:
15-25 Minutes Per Rep Per Day
Sales reps spend time navigating duplicate records, filling in missing information, and verifying which record is correct before outreach.
Segments Built on Incomplete Data
Marketing teams build segments based on incomplete data, reducing campaign effectiveness and wasting ad spend on poorly targeted audiences.
Workflow Failures From Inconsistency
Operations teams troubleshoot workflow failures caused by inconsistent property values — a workflow triggered by "Lifecycle Stage equals MQL" misses every "mql" or blank entry.
Decisions Based on Distorted Reports
Lead counts are inflated, conversion rates appear lower than reality, attribution is fragmented across duplicate records, and customer lifetime value is skewed.
Phase 1: Duplicate Detection and Merging
Deduplication delivers the most immediate, visible impact. Start here.
HubSpot’s Built-In Duplicate Tool
HubSpot provides an AI-powered duplicate detection tool at Contacts > Actions > Manage Duplicates (and the equivalent for companies). The tool suggests pairs of records that likely represent the same person or organization.
Review Suggestions in Batches
Set aside 30-60 minutes per session rather than trying to process everything at once. Decision fatigue leads to merge errors.
Choose the Primary Record Carefully
When merging, one record becomes the primary (surviving) record. Choose the record with the most complete data, the most engagement history, and the most associated records (deals, tickets, etc.).
Understand What Merges and What Doesn't
HubSpot merges contact properties (keeping the most recently updated value), timeline activities, and associations. However, some form submission metadata and workflow enrollment history from the secondary record may be lost.
Beyond the Built-In Tool
HubSpot’s duplicate tool is good for obvious matches but misses many edge cases. Supplement it with these strategies:
- ✓Email domain matching: Export company records and group by domain — "Acme Corp", "Acme Corporation", and "ACME" with the same domain are duplicates
- ✓Phone normalization: (555) 123-4567, 555-123-4567, and +15551234567 are the same number — normalize before dedup
- ✓Cross-object dedup: Check for the same person existing as both a contact and a company (name imported into the company field)
Bulk Merge Strategy
For large-scale deduplication (hundreds or thousands of pairs), manual one-by-one merging is impractical.
HubSpot Bulk Merge
Available on Professional and Enterprise tiers. Allows processing multiple duplicate pairs in sequence with HubSpot's built-in matching.
Workflow-Based Dedup
Data Hub Professional/Enterprise enables custom-coded actions that can automate merge logic for complex matching scenarios.
Dedicated Dedup Tools
Tools like Dedupely or Insycle offer more sophisticated matching algorithms, fuzzy matching, and bulk processing capabilities.
Data Quality Toolkit
Our marketplace includes automated deduplication solutions that handle complex matching scenarios at scale.
Preventing Future Duplicates
Cleaning up existing duplicates means little if new ones keep appearing.
| Duplicate Source | Prevention Method | Priority |
|---|---|---|
| Form submissions | Configure forms to update existing contacts, not always create new ones | High |
| CSV imports | Always map the email field and use "Update existing contacts" option | High |
| Integration syncs | Review connected apps and verify dedup logic in sync settings | High |
| Manual creation | Train team members to search for existing contacts before creating new ones | Medium |
Phase 2: Property Audit and Cleanup
Properties are the schema of your CRM. Over time, they accumulate like digital clutter — created for one-off campaigns, by departed team members, or by integrations that are no longer active.
Exporting Your Property Inventory
Start with a complete export of all properties across all objects:
- ✓Go to Settings > Properties
- ✓Select each object type (Contacts, Companies, Deals, Tickets, Custom Objects)
- ✓Export the full list including: property name, internal name, type, group, creation date, and created by
Identifying Unused Properties
A property is “unused” if it meets one or more of these criteria:
| Unused Criteria | What It Means | Action |
|---|---|---|
| Zero fill rate | No records have a value — created but never populated | Delete after dependency check |
| Stale data | Values not updated in 6+ months, no longer accurate | Clear values or repopulate |
| No downstream usage | Not referenced in workflows, lists, reports, forms, or views | Deprecate, then delete |
| Duplicate purpose | Another property captures the same information more reliably | Consolidate into primary property |
Safe Property Deletion Process
Deleting properties in HubSpot is permanent — the property and all its data are removed forever. Follow the safe deletion process below to avoid accidental data loss.
Document the Property
Record its name, purpose, fill rate, and any known dependencies.
Check Dependencies
Search for the property name in workflows, lists, reports, forms, views, and calculated properties. HubSpot's "Used in" feature helps, but it does not catch every reference.
Export the Data
If the property has any populated values, export the data as a CSV backup before deletion.
Clear and Wait 30 Days
If unsure, clear values from all records (making it effectively unused) and wait 30 days. If nobody reports missing data, it is safe to delete.
Delete in Small Batches
Process deletions 10-20 properties at a time rather than mass-deleting hundreds at once. This makes rollback easier if something breaks.
Property Naming and Organization
While auditing, improve property organization:
- ✓Group related properties — cluster fields by purpose (e.g., "Marketing Attribution", "Billing Information", "Onboarding Data")
- ✓Standardize naming — lowercase with underscores for internal names, title case for display names
- ✓Add descriptions — every custom property should explain its purpose, expected values, and maintainer
- ✓Deprecation labels — prefix display names with "[DEPRECATED]" for properties being phased out
Phase 3: Data Standardization
Even clean, deduplicated data with well-organized properties can undermine your CRM if values are inconsistent. Standardization ensures that the same concept is represented the same way everywhere.
Common Standardization Targets
| Field | Common Problem | Solution |
|---|---|---|
| Lifecycle stages | Contacts stuck in early stages; inconsistent status values | Audit progression rules; standardize values |
| Country/state | "United States", "US", "USA", "U.S.A." in same field | Use dropdown properties instead of free-text |
| Phone numbers | (555) 123-4567 vs. +15551234567 vs. 5551234567 | Standardize to E.164 international format |
| Company names | "Acme Corp" vs. "Acme Corporation" vs. "ACME Inc." | Establish canonical format; decide on suffixes |
| Job titles | Hundreds of variations for the same role | Create "Role Category" dropdown that maps titles |
- "United States", "US", "USA", "america" in country field
- Phone stored as (555)1234567, 555-123-4567, +1 555 123 4567
- 50% of contacts missing lifecycle stage
- Long-time customers still marked "Subscriber"
- 300+ custom properties with <5% fill rates
- Country field uses standardized dropdown values
- All phones in E.164 format: +15551234567
- 95%+ lifecycle stage fill rate with validated progression
- Lifecycle stages reflect actual customer journey
- Lean property set — every field has purpose and >50% fill rate
Automation-Assisted Standardization
For large databases, manual standardization is impractical. Use these HubSpot features:
- ✓Workflows with formatting actions — Data Hub "Format data" actions capitalize names, trim whitespace, convert values
- ✓Calculated properties — normalize values from source properties automatically
- ✓Import mapping — map values to standardized options during the import step
Phase 4: Import Cleanup and Governance
Imports are a major source of data quality issues. Every CSV import is an opportunity to introduce duplicates, inconsistent formatting, and incomplete records.
Pre-Import Checklist
- ✓Deduplication within the file — remove duplicates from the CSV itself before uploading
- ✓Column mapping — every column maps to the correct HubSpot property with the correct type
- ✓Value formatting — pre-format values to match standardization rules (E.164 phones, YYYY-MM-DD dates)
- ✓Required field completeness — every record has values for defined critical fields
- ✓Update vs. create behavior — decide whether to create new records, update existing, or both
Import Governance Policy
Import Governance Rules
Require manager approval for imports exceeding 500+ records. Mandate the pre-import checklist for every import. Log every import with its source, date, record count, and responsible person. Run a post-import duplicate check within 24 hours.
Cleaning Up Historical Imports
If your portal has accumulated data quality issues from past imports, identify the imports that caused the most damage:
- Go to Contacts > Import and review the import history
- For each import, check the “created” vs. “updated” counts — high creation counts from external lists may indicate duplicate generation
- Use the “Original Source” property to segment contacts by import and assess data quality per batch
For organizations planning a migration, import cleanup is especially critical. Dirty data migrated to a new portal is still dirty data. Our guide on what you lose during HubSpot data migration covers this in detail.
Phase 5: Ongoing Maintenance Cadence
Data cleanup is not a one-time project. Without ongoing maintenance, your portal will return to its pre-cleanup state within 6-12 months.
| Frequency | Tasks | Time Required |
|---|---|---|
| Weekly | Process duplicate suggestions, remove hard bounces, spot-check recent contacts | 15-20 minutes |
| Monthly | Fill-rate report, review new properties, audit recent imports, check sync health | 1-2 hours |
| Quarterly | Full property audit, lifecycle stage review, standardization spot-check | Half day |
| Annual | Comprehensive dedup sweep, full property inventory, data completeness benchmark | 1-2 days |
Weekly Tasks (15-20 minutes)
- ✓Review and process HubSpot's duplicate suggestions
- ✓Check for bounced emails and remove hard bounces from active lists
- ✓Review recent form submissions for data quality issues
- ✓Spot-check 5-10 recently created contacts for completeness
Monthly Tasks (1-2 hours)
- ✓Run a fill-rate report on critical properties
- ✓Review newly created custom properties — were they necessary?
- ✓Audit recent imports for data quality issues
- ✓Check integration sync health for data consistency
Quarterly Tasks (Half day)
- ✓Full property audit: identify and remove unused properties
- ✓Lifecycle stage accuracy review
- ✓Data standardization spot-check across key fields
- ✓Review and update import governance policies
Annual Tasks (1-2 days)
- ✓Comprehensive deduplication sweep
- ✓Full property inventory with dependency mapping
- ✓Data completeness benchmark against prior year
- ✓Review and update your data quality KPIs
For a detailed framework on audit timing, see our guide on how often you should audit your HubSpot portal.
Measuring Data Quality Over Time
You cannot improve what you do not measure. Establish baseline metrics during your initial cleanup and track them over time.
Key Data Quality Metrics
| Metric | Definition | Target |
|---|---|---|
| Duplicate rate | % of contact records that are duplicates | Below 3% |
| Critical field fill rate | % of contacts with email, company, lifecycle stage, lead source | Above 85% |
| Property utilization | % of custom properties actively populated and referenced | Above 70% |
| Bounce rate | % of email addresses that hard bounce | Below 2% |
| Lifecycle stage accuracy | % of contacts in correct stage (manual sampling) | Above 90% |
Building a Data Quality Dashboard
Contacts Created Per Month
Track creation volume with source breakdown to identify unexpected spikes from imports or integrations.
Duplicate Merge Volume Per Month
Monitor how many duplicates are being resolved — declining trends mean your prevention efforts are working.
Critical Field Fill Rates
Use calculated properties to generate fill rate percentages and track improvement over time.
Bounced Contact Trends
Track hard bounce accumulation — rising trends signal list hygiene problems or bad data sources.
Lifecycle Stage Distribution
Monitor stage distribution over time. Healthy portals show contacts progressing; unhealthy ones show contacts accumulating in early stages.
Review your data quality dashboard monthly. Improving trends confirm your hygiene cadence is working. Declining trends signal that new data sources or processes are introducing problems.
When to Bring in Professional Help
Some data quality issues are straightforward to address internally. Others require specialized expertise or tooling that goes beyond what HubSpot provides natively.
Your portal has more than 100,000 contacts requiring sophisticated matching algorithms, multiple integrations are creating conflicting data, you are preparing for a portal consolidation, or your team lacks the bandwidth to execute cleanup while maintaining day-to-day operations.
Jetstack’s data quality and audit solutions are designed for exactly these scenarios. We combine automated scanning with expert review to clean, standardize, and maintain your HubSpot data. Explore our implementation services for large-scale cleanup projects, or contact us to discuss your specific situation.
Frequently Asked Questions
How many duplicates is “normal” in a HubSpot portal?
Portals that have never been cleaned typically have a duplicate rate between 10-25%, depending on the number of data sources and import volume. After an initial cleanup, a well-maintained portal should keep duplicate rates below 3-5%. Any rate above 10% is costing you meaningful money in inflated subscription tiers and wasted team time.
Will merging duplicates in HubSpot lose any data?
When merging two contact records, HubSpot retains property values from the primary (surviving) record for any fields where both records have data. Timeline activities from both records are combined. However, some metadata — like individual form submission details and certain workflow enrollment data from the secondary record — may not transfer completely. Always designate the more complete and recently active record as the primary.
How do I find unused properties in HubSpot?
Go to Settings > Properties, select the object type, and look for properties with “0 records” or very low record counts in the fill-rate indicators. For a more thorough analysis, export all properties and cross-reference against your workflows, lists, reports, and forms to identify properties that have data but are not used in any downstream process. HubSpot’s property settings show a “Used in” indicator, but it does not capture every reference.
Can I undo a property deletion in HubSpot?
No. Property deletion in HubSpot is permanent and irreversible. Once a property is deleted, both the property definition and all associated data are gone. This is why we strongly recommend exporting property data before deletion and using the “clear and wait” approach for properties you are unsure about.
What is the best tool for HubSpot data cleanup?
HubSpot’s native tools handle basic deduplication and property management. For more advanced needs, third-party tools like Insycle, Dedupely, and Data Quality Command Center offer features like fuzzy matching, bulk standardization, and automated hygiene rules. Jetstack’s marketplace also includes purpose-built data quality toolkits for HubSpot that combine automated scanning with guided remediation.
Should I clean data before or after a HubSpot migration?
Before. Always before. Migrating dirty data to a new portal just transfers the problem — and can make it worse if the migration process creates additional duplicates. A pre-migration audit that includes thorough data cleanup typically saves 20-40% of post-migration remediation effort and cost.