You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/setup/databricks.md
+28-12Lines changed: 28 additions & 12 deletions
Original file line number
Diff line number
Diff line change
@@ -1,4 +1,4 @@
1
-
Please pay attention to the Spark version postfix and Scala version postfix on our [Maven Coordinate page](../maven-coordinates). Databricks Spark and Apache Spark's compatibility can be found here: https://docs.databricks.com/en/release-notes/runtime/index.html
1
+
Please pay attention to the Spark version postfix and Scala version postfix on our [Maven Coordinate page](maven-coordinates.md). Databricks Spark and Apache Spark's compatibility can be found [here](https://docs.databricks.com/en/release-notes/runtime/index.html).
2
2
3
3
## Community edition (free-tier)
4
4
@@ -8,18 +8,18 @@ You just need to install the Sedona jars and Sedona Python on Databricks using D
8
8
9
9
1) From the Libraries tab install from Maven Coordinates
Of course, you can also do the steps above manually.
70
+
69
71
### Create an init script
70
72
71
73
!!!warning
72
-
Starting from December 2023, Databricks has disabled all DBFS based init script (/dbfs/XXX/<script-name>.sh). So you will have to store the init script from a workspace level (`/Users/<user-name>/<script-name>.sh`) or Unity Catalog volume (`/Volumes/<catalog>/<schema>/<volume>/<path-to-script>/<script-name>.sh`). Please see https://docs.databricks.com/en/init-scripts/cluster-scoped.html#configure-a-cluster-scoped-init-script-using-the-ui
74
+
Starting from December 2023, Databricks has disabled all DBFS based init script (/dbfs/XXX/<script-name>.sh). So you will have to store the init script from a workspace level (`/Workspace/Users/<user-name>/<script-name>.sh`) or Unity Catalog volume (`/Volumes/<catalog>/<schema>/<volume>/<path-to-script>/<script-name>.sh`). Please see [Databricks init scripts](https://docs.databricks.com/en/init-scripts/cluster-scoped.html#configure-a-cluster-scoped-init-script-using-the-ui) for more information.
75
+
76
+
!!!note
77
+
If you are creating a Shared cluster, you won't be able to use init scripts and jars stored under `Workspace`. Please instead store them in `Volumes`. The overall process should be the same.
73
78
74
79
Create an init script in `Workspace` that loads the Sedona jars into the cluster's default jar directory. You can create that from any notebook by running:
# On cluster startup, this script will copy the Sedona jars to the cluster's default jar directory.
89
-
# In order to activate Sedona functions, remember to add to your spark configuration the Sedona extensions: "spark.sql.extensions org.apache.sedona.viz.sql.SedonaVizExtensions,org.apache.sedona.sql.SedonaSqlExtensions"
Of course, you can also do the steps above manually.
101
+
96
102
### Set up cluster config
97
103
98
104
From your cluster configuration (`Cluster` -> `Edit` -> `Configuration` -> `Advanced options` -> `Spark`) activate the Sedona functions and the kryo serializer by adding to the Spark Config
@@ -120,3 +126,13 @@ pydeck==0.8.0
120
126
121
127
!!!tips
122
128
You need to install the Sedona libraries via init script because the libraries installed via UI are installed after the cluster has already started, and therefore the classes specified by the config `spark.sql.extensions`, `spark.serializer`, and `spark.kryo.registrator` are not available at startup time.*
129
+
130
+
### Verify installation
131
+
132
+
After you have started the cluster, you can verify that Sedona is correctly installed by running the following code in a notebook:
133
+
134
+
```python
135
+
spark.sql("SELECT ST_Point(1, 1)").show()
136
+
```
137
+
138
+
Note that: you don't need to run the `SedonaRegistrator.registerAll(spark)` or `SedonaContext.create(spark)` in the advanced edition because `org.apache.sedona.sql.SedonaSqlExtensions` in the Cluster Config will take care of that.
Copy file name to clipboardExpand all lines: docs/setup/emr.md
+10Lines changed: 10 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -52,3 +52,13 @@ When you create an EMR cluster, in the software configuration, add the following
52
52
53
53
!!!note
54
54
If you use Sedona 1.3.1-incubating, please use `sedona-python-adpater-3.0_2.12` jar in the content above, instead of `sedona-spark-shaded-3.0_2.12`.
55
+
56
+
## Verify installation
57
+
58
+
After the cluster is created, you can verify the installation by running the following code in a Jupyter notebook:
59
+
60
+
```python
61
+
spark.sql("SELECT ST_Point(0, 0)").show()
62
+
```
63
+
64
+
Note that: you don't need to run the `SedonaRegistrator.registerAll(spark)` or `SedonaContext.create(spark)` because `org.apache.sedona.sql.SedonaSqlExtensions` in the config will take care of that.
Copy file name to clipboardExpand all lines: docs/tutorial/benchmark.md
-1Lines changed: 0 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -3,5 +3,4 @@
3
3
We welcome people to use Sedona for benchmark purpose. To achieve the best performance or enjoy all features of Sedona,
4
4
5
5
* Please always use the latest version or state the version used in your benchmark so that we can trace back to the issues.
6
-
* Please consider using Sedona core instead of Sedona SQL. Due to the limitation of SparkSQL (for instance, not support clustered index), we are not able to expose all features to SparkSQL.
7
6
* Please open Sedona kryo serializer to reduce the memory footprint.
Copy file name to clipboardExpand all lines: docs/tutorial/sql.md
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -43,7 +43,7 @@ Detailed SedonaSQL APIs are available here: [SedonaSQL API](../api/sql/Overview.
43
43
44
44
## Create Sedona config
45
45
46
-
Use the following code to create your Sedona config at the beginning. If you already have a SparkSession (usually named `spark`) created by Wherobots/AWS EMR/Databricks, please skip this step and can use `spark` directly.
46
+
Use the following code to create your Sedona config at the beginning. If you already have a SparkSession (usually named `spark`) created by AWS EMR/Databricks/Microsoft Fabric, please ==skip this step==.
47
47
48
48
==Sedona >= 1.4.1==
49
49
@@ -147,7 +147,7 @@ The following method has been deprecated since Sedona 1.4.1. Please use the meth
147
147
148
148
## Initiate SedonaContext
149
149
150
-
Add the following line after creating Sedona config. If you already have a SparkSession (usually named `spark`) created by Wherobots/AWS EMR/Databricks, please call `SedonaContext.create(spark)` instead.
150
+
Add the following line after creating Sedona config. If you already have a SparkSession (usually named `spark`) created by AWS EMR/Databricks/Microsoft Fabric, please call `sedona = SedonaContext.create(spark)` instead. For ==Databricks==, the situation is more complicated, please refer to [Databricks setup guide](../setup/databricks.md), but generally you don't need to create SedonaContext.
0 commit comments