Removing duplicate objects from an Array (is hard)
Let’s say we have an array of objects such as the following:
const books = [
{
name: "My Sister the Serial Killer",
author: "Oyinkan Braithwaite"
},
{
name: "Educated",
author: "Tara Westover"
},
{
name: "My Sister the Serial Killer",
author: "Oyinkan Braithwaite"
}
];
The first and the last objects in the array are identical. So what if we want to remove such duplicate objects from the array? Surprisingly, this is quite a difficult problem to solve. To understand why, let’s look at how we could remove duplicates from an Array of flat items, such as strings.
Removing duplicate flat items from an Array (is easy) #
Let’s say we had an Array of strings such as the following:
const strings = [
"My Sister the Serial Killer",
"Educated",
"My Sister the Serial Killer"
];
If we wanted to remove any duplicates from this array, we could use the filter()
method along with the indexOf()
method to check if any given item is a duplicate.
const filteredStrings = strings.filter((item, index) => {
// Return to new array if the index of the current item is the same
// as the first occurence of the item
return strings.indexOf(item) === index;
});
Since strings.indexOf(item)
will always return the index of the first occurrence of the item
, we can tell if the current item within the filter loop is a duplicate. If it is, we don't return it to the new array created by the filter()
method
Objects don’t work the same way #
The reason this same method doesn’t work with objects is because any 2 objects that have identical properties and values are not actually considered identical.
const a = {
name: "My Sister the Serial Killer",
author: "Oyinkan Braithwaite"
};
const b = {
name: "My Sister the Serial Killer",
author: "Oyinkan Braithwaite"
};
a === b // false
This is because objects are compared based on reference rather than structure. The fact that the two objects have the same orperties and values is not considered when comparing the two. Therefore, the indexOf(object)
within an array of objects will always return the index of the exact object
passed, even if there exists another object with the exact same properties and values.
My solution #
Given this information, the only way to check if two objects have the same properties and values is to actually check the properties and values of each object. The solution I came up with involved doing this manual check, but with some enhacements to improve performance and reduce unnecessary nested loops.
In particular, I did 3 things -
- Only check each item in the array against every other item that comes after it in order to avoid comparing the same objects more than once
- Only check items that have not been found to be a duplicate of any other item
- Before checking if the values of each property are the same, check that both objects have the same keys
Here’s the final function:
function removeDuplicates(arr) {
const result = [];
const duplicatesIndices = [];
// Loop through each item in the original array
arr.forEach((current, index) => {
if (duplicatesIndices.includes(index)) return;
result.push(current);
// Loop through each other item on array after the current one
for (let comparisonIndex = index + 1; comparisonIndex < arr.length; comparisonIndex++) {
const comparison = arr[comparisonIndex];
const currentKeys = Object.keys(current);
const comparisonKeys = Object.keys(comparison);
// Check number of keys in objects
if (currentKeys.length !== comparisonKeys.length) continue;
// Check key names
const currentKeysString = currentKeys.sort().join("").toLowerCase();
const comparisonKeysString = comparisonKeys.sort().join("").toLowerCase();
if (currentKeysString !== comparisonKeysString) continue;
// Check values
let valuesEqual = true;
for (let i = 0; i < currentKeys.length; i++) {
const key = currentKeys[i];
if ( current[key] !== comparison[key] ) {
valuesEqual = false;
break;
}
}
if (valuesEqual) duplicatesIndices.push(comparisonIndex);
} // end for loop
}); // end arr.forEach()
return result;
}