Skip to content

Speeding up _.uniq method by using hash table lookups for primitive data... #2099

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

Conversation

ch-ms
Copy link

@ch-ms ch-ms commented Mar 10, 2015

... types.

It is now works 200 times faster on my macbook air on an unsorted array with primitive data types.

I used these tests with 1000000 elements to measure perfomance:

function testSpeed(n){
var tarr = [];

for(var i = 0; i<=n; i++){
    tarr.push(Math.floor(Math.random()*10000));
}

var startTime = Date.now();
_.uniq(tarr);

console.log("Ended by:", Date.now()-startTime);

}

function testSpeedIter(n){
var tarr = [];

for(var i = 0; i<=n; i++){
    tarr.push(Math.floor(Math.random()*10000));
}

var iter = function(v, i, array){
    return v+"some";
}

var startTime = Date.now();

_.uniq(tarr, iter);

console.log("Ended by:", Date.now()-startTime);

}

@bnjmnt4n
Copy link
Contributor

You should use jsPerf for performance tests.

seen.push(computed);
} else {
var vType = typeof computed,
isPrimitive = _.contains(['number', 'string', 'boolean', 'undefined'], vType);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about !_.isObject(computed)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will catch null, but will fail for symbols.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akre54 I have implemented this, thanks for advice!
@michaelficarra I think I can patch isObject function to recognize symbols as primitives. Is it ok for a symbol to be a primitive?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ch-ms _.isObject already treats symbols as primitives. The problem is that unique symbols don't have a unique toString, so this method would want to either use the symbol directly (dangerous) or treat them as objects.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine to treat it as an object, no? indexOf should work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that unique symbols don't have a unique toString, so this method would want to either use the symbol directly (dangerous) or treat them as objects.

Placing a symbol in an object is pretty safe - you can treat them as primitives and not have to worry about shim implementations being caught due to the typeof checks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah actually after testing it out...

> a = Symbol()
Symbol()
> b = Symbol()
Symbol()
> obj = {}
{}
> obj[a] = 1
1
> obj[b] = 2
2
> obj[a]
1

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't saying it was unsafe because setting/getting them would not work. I was saying it was unsafe because well-known symbols have behavioural changes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide an example @michaelficarra

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@megawac See https://people.mozilla.org/~jorendorff/es6-draft.html#sec-properties-of-the-symbol-constructor. Each of the well-known symbols there changes how the object behaves in some way. It is not safe in a future with additional well-known symbols to modify these fields arbitrarily.

@megawac
Copy link
Collaborator

megawac commented Mar 11, 2015

How does this compare with using Set see #1868 (comment)

@@ -520,19 +520,34 @@
if (iteratee != null) iteratee = cb(iteratee, context);
var result = [];
var seen = [];
var seenPrimitive = {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth doing nativeCreate ? nativeCreate(null) : {}?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@megawac reason?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some browsers used to optimize for it. Doesn't seem necessary any more based on some benchmarks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll actually be necessary if anything pollutes Object's prototype.

@ch-ms
Copy link
Author

ch-ms commented Mar 12, 2015

@D10 I made a test for integers http://jsperf.com/underscore-uniq-hash-table-lookup

@megawac
Copy link
Collaborator

megawac commented Mar 12, 2015

Whats the hit on an array of objects @ch-ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants