HubSpot Operation Hub : How to deduplicates contacts and companies /
How to get rid off duplicates contacts and companies

26/07/2023

No CRM portal is 100% clean, learn how to deduplicate contacts and companies with HubSpot Operations Hub. This technique leverages the power of the Custom Coded Action.

Having unique data in a CRM system is important because it can help to avoid duplicate contacts, which can be annoying for customers and can also lead to lost opportunities. For example, if a customer is contacted by multiple salespeople, they may become frustrated and less likely to do business with you. Additionally, duplicate contacts can make it difficult to track customer interactions and to identify opportunities to upsell or cross-sell products or services.

What is needed to deduplicate ?

To deduplicate you need to have a unique key.
A unique key is a column or a combination of columns that uniquely identifies a record in a database. Unique keys are used to deduplicate data by identifying and removing duplicate records. This can improve the accuracy and reliability of data.

As an example when you register on an app like Instagram or Twitter they ask for your phone number. They use the phone number as a unique key if they need to deduplicate.
Indeed, it’s quite easy to have multiple email addresses, you can also use another name and nickname, but few people have more than one phone number. So for an APP like Instagram a phone number is a pretty good unique key.

In the example below we have two records, with somewhat similar names, they maybe are duplicates. But what unique key could we use here ? As you can see the only common key value pair in the two records is carPlate so we can use this key to deduplicate.

How to look for duplicates when a new data is inserted in HubSpot ?

To avoid duplicates, each time we get a new record in our CRM we can ask HubSpot to give us the list of all the records with a matching key value pair.

In our example, each time we get a new contact, we can ask HubSpot the list of contacts with the property carPlate = RM-123-IE

If he gets more than one record back, it means we have a duplicate.

In HubSpot you can use a workflow to run a logic each time something happens in our CRM. In that example each time we get a new contact with the carPlate value set.

The request looks like this :

As you can see on this schema, the request will return those two records :

Once we identified two records, we can compare them and decide the one we keep.
Deduplication works if we found 2 records, if we find more than 2 the logic is a bit more complicated as we have to perform two merges. It’s what we call an ambiguous merge. In this article, we will put this concept aside.

Merging criterias

Now that we have our duplicated records, we have to merge them. But before, we have to choose the record we keep and the one we remove.
That logic is up to you, but there are common logics.

Keep the record created first

With that criteria, the oldest recorded will remain, and the new one will be merged. It’s the most common criteria to merge.

Keep the record created last

As you can expect with that criteria, the latest record will remain and the oldest one will be merged into the newest.

Keep the record most recently updated

Keep the record which had the most recent update

Keep the record most recent engagement

Keep the record with the most recent meeting, calls, tasks…

Keep the record with the oldest engagement

Keep the record with the oldest meeting, calls, tasks…

How to set up a deduplication workflow in HubSpot ?

Create a workflow based on the type of objects you want to dedupe.
In this example, I create a contact based Workflow.

Add a trigger

A HubSpot workflow trigger is an event that causes a workflow to start. For example, you could create a workflow that is triggered when a contact fills out a form or in our case when criteria is met.
When the contact has a carPlate set, the workflow will start and the contact will be enrolled in the workflow.

Add a Custom Coded Action block

A custom coded action block in an HubSpot Workflow is a block of JavaScript or Python code that you can add to a workflow to perform custom actions. It’s a super block that you can use to create your own logic inside a workflow.

In our case we are going to use a Custom Coded Action block to add a logic to deduplicate our contacts based an the carPlate value.

Add your private app token

In the secret section you can save your API keys, pass phrases… that you don’t want to display in your custom code.

To deduplicate we need to call the HubSpot API so you need to have a private app token with the corresponding scope.

The scopes for our custom coded actions should be at least : read companies, write companies, read contacts, write contacts

As name you can use the name you want, in my setup I use the name privateAppToken

Paste the Custom Coded Action

First remove all the existing code then if you want to deduplicate contacts you have to use this code , if you want to deduplicate companies use that one.

Edit the Options

At the top of the code you can set the options you want.


/* * * Edit your Secret Name here */ const SECRET_NAME = "privateAppToken" /* * * Choose the dedupe key */ const DEDUPE_KEY = "carplate"; /* * Choose the merging criteria you want in that list * * Possible values : * * most-recent-engagement * oldest-engagement * created-first * created-last * most-recently-updated */ const MERGE_CRITERIA_KEEP_THE_ONE = "created-first"; /* * * If the DRY_RUN is true, no merge will be performed */ const DRY_RUN = false;

SECRET_NAME :

If your secret name is not privateAppToken edit the line :

const SECRET_NAME = "privateAppToken"

With your secret name like :

const SECRET_NAME = "myOwnPrivateAppName"

DEDUPE_KEY :

In my case If have the dedupe key carPlate but yours will be different so edit this line :

const DEDUPE_KEY = "carplate";

with your own key like :

const DEDUPE_KEY = "yourOwnKey";

To find the key you have to use you can follow this video :

MERGE_CRITERIA_KEEP_THE_ONE :

Set the merging criteria you want to use between those possibilities :

DRY_RUN :

If const DRY_RUN = false; then nothing will be merged, this option is interesting to test the logic.

Possible values :

const DRY_RUN = false;
const DRY_RUN = true;

Data outputs

In a Custom Coded Action, you can specify a return. In other words, what the Custom Coded Action will throw back to the Workflow. This Custom Coded Action returns couple of things

As an example :

To set the output underneath the code there’s a « Data outputs » section where you have to put the following :

It should look like this :

Turn the workflow on

TO turn the workflow on, you can follow this video :

NB : You can’t use the option Yes, enroll existing companies which meet the trigger criteria as of now as it will trigger API rate limit error. ( I’m working on it to find a fix)