Skip to content

Performance bottleneck on relation add #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
assaad opened this issue Jul 14, 2016 · 7 comments
Open

Performance bottleneck on relation add #9

assaad opened this issue Jul 14, 2016 · 7 comments
Assignees

Comments

@assaad
Copy link
Contributor

assaad commented Jul 14, 2016

After benchmarking, a big bottleneck currently is this code:

long[] incArray = new long[previous.length + 1];
System.arraycopy(previous, 0, incArray, 0, previous.length);
incArray[previous.length] = relatedNode.id();

In the add method of the AbstractNode class

where the increment of long[] is only by one.
We should increment by 30% each time.
the perf increases from 5,000 values/sec to 25,000,000 v/s

@rouvoy
Copy link

rouvoy commented Jul 14, 2016

Out of curiosity wouldn't a better speedup by doing:

long[] incArray = new long[previous.length * 2];

instead of a static value? What does your benchmark says about that?

@assaad
Copy link
Contributor Author

assaad commented Jul 14, 2016

it's trade-off between memory and performance. The problem is that we can't waste on all the relationships of all the nodes lots of memory.

  • The greediest way is an increase of (105% +1) according to my benchmark simulations, speed: 13,9 Million inserts/sec
  • If we want better performance and still not much memory losses (130%+1) is good, speed:
    30,7 Million inserts/sec
  • the (x200%+1) speed is: 38 Millions insert/sec
  • Knowing that limit+1 speed is around: 17000 inserts/s only!!!

@assaad
Copy link
Contributor Author

assaad commented Jul 14, 2016

If the maximum relationship size is less than 200, than a x200% is better without much cost.
maybe we should have an array size increase strategy according to the current array size.
Something like x200% if array.length <100. and afterwards increase by 30% each time not to waste lots of memories.
An open question is: when we persist to hard-disk, should we save the whole array in total (with the empty slots, or only the meaningful relationships?)

@rouvoy
Copy link

rouvoy commented Jul 14, 2016

Memory matters and it definitively depends on the expected number of elements to be stored, but 200% tends to be a rather adaptive policy.
Yet, when persisting, you should only store the meaningful elements and not the full array, anyway you are probably storing the index somewhere so you know the number of elements to be persisted.

@thomashartmann
Copy link
Contributor

thomashartmann commented Jul 14, 2016

We have to take care if we optimise the serialisation: if someone commits after every add operation, then our optimisation would be useless... The additional cost for persisting is from my point of view not a big deal (and can be handled by a compression algorithm anyway). It is more the cost in-memory which is problematic not the additional disk space.

@assaad
Copy link
Contributor Author

assaad commented Jul 14, 2016

The point is: this factor of increase is for relationship, so the memory increase will be:
factor * nodes Avg(NumOfRelationShips/node) Avg(timepoints/node) * Avg (World/node).
For example, let's say we have: 1,000 nodes. Each node having 4 types of multiple relationships, and each node has 5000 versions in times (timepoints), and for simplicity we consider 1 world.
Then if we have: 4*1000 * 5000 long[] of relationships. the factor will be multiplicative in this much in memory needed

@dukeboard
Copy link
Contributor

Hi everyone,

Good discussion and good benchmarks to isolate the problem.
This is definitively true, the percentage of increase of the array backend of relationships will highly impact performance. Usually the factor of 200% is well accepted and all our hashmap structures follow this rule. The modification can be done within a minor impact by allocating a first int/double/long of each relationships to manage this. Both Heap and OffHeap structures should be modified then. I try to propose something and report the commit number in this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants