Removing Duplicate Items from an Array


#1
**THE** place to discuss this tutorial: https://www.kirupa.com/html5/removing_duplicate_items_from_an_array.htm

#2

This is another way to do it with forward iteration through array items !

Array.prototype.removeDuplicates = function(){
	var input = this;
	var obj = new Object();

	for(var i=0;i<input.length;i++) {
	  	var item = input[i];

	  	if(obj[item] == true) {

	  		input.splice(i,1);
	  		i--;

	  	} else {

	  	   obj[item]= true;
	    }
    }
    return input;
}

#3

Thanks for sharing that! One of the things I’ve started thinking about is the slow-ish performance of splice. I wonder if another optimization would be to mark all the duplicate items with some default value like undefined or null in one pass.

Original array:

var foo = [1, 3, 4, 3, 5, 6, 7, 7, 8];

Marked array:

var foo = [1, 3, 4, undefined, 5, 6, 7, undefined, 8];

In the second pass, we create a new array made up all the the non-undefined/non-null items. This would result in us never having to call splice repeatedly on arrays with a lot of duplicate items. We would just run through the array twice: once to identify & mark duplicates, once to create a new array with the non-duplicate values.

Thoughts?

:slight_smile:


#4

Is this faster than splice?

var myArray = ["bar", "foo", "zorb", "bar", "baz", "baz", "fum", "baz"]

Array.prototype.removeDuplicates = function() {
    var hashObject = {}
    for (var i = 0; i < this.length; i++) {
        var currentItem = this[i]; 
        if (hashObject[currentItem]) {
            if(i != this.length - 1) this[i--] = this.pop()
            else this.pop()
        } else hashObject[currentItem] = true;
    }
    return this
}
 
myArray.removeDuplicates()
alert(myArray)

There’s also this. It’s similar, but not identical, behaviour.

var myArray = ["bar", "foo", "zorb", "bar", "baz", "fum", "baz"]

Array.prototype.removeDuplicates = function() {
    var output = []
    var hashObject = {}
    for (var i = 0; i < this.length; i++) {
        var currentItem = this[i] 
        if (hashObject[currentItem] == undefined) {
            output.push(currentItem)
            hashObject[currentItem] = 0
        }
    }
    return output
}
 
myArray = myArray.removeDuplicates()
alert(myArray)

#5

@snitin8994, @TheCanadian - I put all of the approaches into jsperf: https://jsperf.com/removing-duplicates/1

On Chrome, both of your approaches were equally fast TC. Mine and snitin’s approaches were 20% slower. On Safari, all four approaches are nearly identical. My original implementation is about 2% faster.


#6

Y U guys type so much?

myArray = [...new Set(myArray)]

#7

Haha! I was waiting for you to specify that one :stuck_out_tongue:

I mentioned it as one of the tricks here, but I didn’t have the heart to take down that article: https://www.kirupa.com/html5/array_tricks.htm#array_unique

EDIT: JSPerf shows this to be slower…going against everything I thought would be the case!


#8

Y I type so much? Because I don’t know JavaScript :wink:


#9

Cool website to know about. I tried it in Edge and you’re approach with splice was fastest followed closely by my first approach.

It’s gotta be frustrating developing for so many browsers with different behaviours. If only there was some plug-in that performed consistently across all browsers and devices. Oh well…


#10

It is frustrating! If you are a web developer, how do you choose the right approach? My initial thought process is:

  1. Does it even matter? For less than tens of thousands of items, the performance difference between these approaches is minimal. If the performance benefits of one approach over another isn’t noticeable, just go with the approach that is more readable/understandable by you or your dev team.

  2. Go where your users are. If your web site traffic skews heavily towards one particular browser, optimizing for that makes the most sense provided #1 still holds. If your code is unusable for one particular browser, then you may want to optimize for making it work first on all browsers.

Regarding your plug-in comment, we are mostly in a Chrome (desktop and Android) and Safari (iOS) world based on the latest marketshare numbers. It isn’t Flash-like consistency and ubiquity, but all of the browser rendering engines and JS engines are becoming more similar. We aren’t too far away from having consistent browser rendering, performance, and capabilities.

Cheers,
Kirupa :stuck_out_tongue: