Performance bottleneck on relation add #9

assaad · 2016-07-14T13:58:00Z

After benchmarking, a big bottleneck currently is this code:

long[] incArray = new long[previous.length + 1];
System.arraycopy(previous, 0, incArray, 0, previous.length);
incArray[previous.length] = relatedNode.id();

In the add method of the AbstractNode class

where the increment of long[] is only by one.
We should increment by 30% each time.
the perf increases from 5,000 values/sec to 25,000,000 v/s

rouvoy · 2016-07-14T14:19:51Z

Out of curiosity wouldn't a better speedup by doing:

long[] incArray = new long[previous.length * 2];

instead of a static value? What does your benchmark says about that?

assaad · 2016-07-14T14:31:18Z

it's trade-off between memory and performance. The problem is that we can't waste on all the relationships of all the nodes lots of memory.

The greediest way is an increase of (105% +1) according to my benchmark simulations, speed: 13,9 Million inserts/sec
If we want better performance and still not much memory losses (130%+1) is good, speed:
30,7 Million inserts/sec
the (x200%+1) speed is: 38 Millions insert/sec
Knowing that limit+1 speed is around: 17000 inserts/s only!!!

assaad · 2016-07-14T14:42:14Z

If the maximum relationship size is less than 200, than a x200% is better without much cost.
maybe we should have an array size increase strategy according to the current array size.
Something like x200% if array.length <100. and afterwards increase by 30% each time not to waste lots of memories.
An open question is: when we persist to hard-disk, should we save the whole array in total (with the empty slots, or only the meaningful relationships?)

rouvoy · 2016-07-14T14:46:32Z

Memory matters and it definitively depends on the expected number of elements to be stored, but 200% tends to be a rather adaptive policy.
Yet, when persisting, you should only store the meaningful elements and not the full array, anyway you are probably storing the index somewhere so you know the number of elements to be persisted.

thomashartmann · 2016-07-14T15:06:07Z

We have to take care if we optimise the serialisation: if someone commits after every add operation, then our optimisation would be useless... The additional cost for persisting is from my point of view not a big deal (and can be handled by a compression algorithm anyway). It is more the cost in-memory which is problematic not the additional disk space.

assaad · 2016-07-14T15:08:28Z

The point is: this factor of increase is for relationship, so the memory increase will be:
factor * nodes Avg(NumOfRelationShips/node) Avg(timepoints/node) * Avg (World/node).
For example, let's say we have: 1,000 nodes. Each node having 4 types of multiple relationships, and each node has 5000 versions in times (timepoints), and for simplicity we consider 1 world.
Then if we have: 4*1000 * 5000 long[] of relationships. the factor will be multiplicative in this much in memory needed

dukeboard · 2016-07-15T07:38:43Z

Hi everyone,

Good discussion and good benchmarks to isolate the problem.
This is definitively true, the percentage of increase of the array backend of relationships will highly impact performance. Usually the factor of 200% is well accepted and all our hashmap structures follow this rule. The modification can be done within a minor impact by allocating a first int/double/long of each relationships to manage this. Both Heap and OffHeap structures should be modified then. I try to propose something and report the commit number in this thread.

assaad added the enhancement label Jul 14, 2016

assaad assigned dukeboard Jul 14, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance bottleneck on relation add #9

Performance bottleneck on relation add #9

assaad commented Jul 14, 2016

rouvoy commented Jul 14, 2016

assaad commented Jul 14, 2016 •

edited

Loading

assaad commented Jul 14, 2016

rouvoy commented Jul 14, 2016

thomashartmann commented Jul 14, 2016 •

edited

Loading

assaad commented Jul 14, 2016 •

edited

Loading

dukeboard commented Jul 15, 2016

Performance bottleneck on relation add #9

Performance bottleneck on relation add #9

Comments

assaad commented Jul 14, 2016

rouvoy commented Jul 14, 2016

assaad commented Jul 14, 2016 • edited Loading

assaad commented Jul 14, 2016

rouvoy commented Jul 14, 2016

thomashartmann commented Jul 14, 2016 • edited Loading

assaad commented Jul 14, 2016 • edited Loading

dukeboard commented Jul 15, 2016

assaad commented Jul 14, 2016 •

edited

Loading

thomashartmann commented Jul 14, 2016 •

edited

Loading

assaad commented Jul 14, 2016 •

edited

Loading