Accelerated partitioning #114

JulienPeloton · 2018-11-30T19:56:46Z

This is an aborted PR in which I tried to accelerate the partitioning by filtering the dataset prior to repartitioning:

// Load data
df = spark....load()

// Add a column with future repartitioning
df_colid = df.prePartition(...)

// Keep only data of interest
df_colid_subset = df_colid.filter(cond)

// Repartition only data of interest
df_repart = df_colid_subset.repartitionByCol(...)

But it turns out to be super slow, because of the udf in prePartition.

So I decided to delay this development, but still merging other minor changes (docs + getNeighborNodes) which are useful.

TBC.

… partitioning scheme. Add unit tests

…nt input types

JulienPeloton added 13 commits November 29, 2018 08:53

Remove unused scala object

8150387

Add the possibility to return main node + neighbour nodes for a given…

96be735

… partitioning scheme. Add unit tests

Minor update to the website doc

dfa93cd

UDF seems very slow - try using mapPartitions RDD method

18f3e16

Fix indices for mapping elements in prePartitioning

a8778e0

Fix indices for mapping elements in prePartitioning - and add differe…

95325cc

…nt input types

Fix typo in column index for checking the type

dd35303

no-go - extremely slow - reverting

4b78009

Try to avoid using objects outside the scope of the UDF - test 1

a486c48

make grids an attribute of SpatialPartitioners

97abc97

REverting...

24bb0bc

remove unused import

d16c101

Add new pictures for repartitioning

945798a

JulienPeloton added the Partitioning label Nov 30, 2018

JulienPeloton self-assigned this Nov 30, 2018

JulienPeloton merged commit d7b9f43 into master Nov 30, 2018

This was referenced Nov 30, 2018

Refactor the API #106

Closed

pyspark3d: improve the user interface #95

Closed

Store Metadata in Shape3DRDD #85

Closed

DataFrame API #84

Closed

Bump version from 0.3.0 -> 0.3.1 #115

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Accelerated partitioning #114

Accelerated partitioning #114

Uh oh!

JulienPeloton commented Nov 30, 2018

Uh oh!

Uh oh!

Accelerated partitioning #114

Accelerated partitioning #114

Uh oh!

Conversation

JulienPeloton commented Nov 30, 2018

Uh oh!

Uh oh!