1
1
GeoPySpark
2
- ***********
2
+ **********
3
+
3
4
.. image :: https://travis-ci.org/locationtech-labs/geopyspark.svg?branch=master
4
5
:target: https://travis-ci.org/locationtech-labs/geopyspark
5
6
6
7
.. image :: https://readthedocs.org/projects/geopyspark/badge/?version=latest
7
8
:target: https://geopyspark.readthedocs.io/en/latest/?badge=latest
8
9
10
+ .. image :: https://badges.gitter.im/locationtech-labs/geopyspark.png
11
+ :target: https://gitter.im/geotrellis/geotrellis
9
12
10
13
GeoPySpark is a Python bindings library for `GeoTrellis <http://geotrellis.io >`_, a Scala
11
14
library for working with geospatial data in a distributed environment.
12
15
By using `PySpark <http://spark.apache.org/docs/latest/api/python/pyspark.html >`_, GeoPySpark is
13
- able to provide na interface into the GeoTrellis framework.
16
+ able to provide an interface into the GeoTrellis framework.
17
+
18
+ Links
19
+ -----
20
+
21
+ * `Documentation <https://geopyspark.readthedocs.io >`_
22
+ * `Gitter <https://gitter.im/geotrellis/geotrellis >`_
14
23
15
24
A Quick Example
16
- ----------------
25
+ ---------------
17
26
18
27
Here is a quick example of GeoPySpark. In the following code, we take NLCD data
19
28
of the state of Pennsylvania from 2011, and do a masking operation on it with
@@ -65,27 +74,10 @@ for you:
65
74
layer_name = ' north-west-philly' ,
66
75
tiled_raster_layer = pyramid)
67
76
77
+ For additional examples, check out the `Jupyter notebook demos <./notebook-demos >`_.
68
78
69
- Contact and Support
70
- --------------------
71
-
72
- If you need help, have questions, or like to talk to the developers (let us
73
- know what you're working on!) you contact us at:
74
-
75
- * `Gitter <https://gitter.im/geotrellis/geotrellis >`_
76
- * `Mailing list <https://locationtech.org/mailman/listinfo/geotrellis-user >`_
77
-
78
- As you may have noticed from the above links, those are links to the GeoTrellis
79
- gitter channel and mailing list. This is because this project is currently an
80
- offshoot of GeoTrellis, and we will be using their mailing list and gitter
81
- channel as a means of contact. However, we will form our own if there is a need
82
- for it.
83
-
84
- Setup
85
- ------
86
-
87
- GeoPySpark Requirements
88
- ^^^^^^^^^^^^^^^^^^^^^^^^
79
+ Requirements
80
+ ------------
89
81
90
82
============ ============
91
83
Requirement Version
@@ -96,9 +88,9 @@ Python 3.3 - 3.6
96
88
Spark >=2.1.1
97
89
============ ============
98
90
99
- Java 8 and Scala 2.11 are needed for GeoPySpark to work; as they are required by
91
+ Java 8 and Scala 2.11 are needed for GeoPySpark to work, as they are required by
100
92
GeoTrellis. In addition, Spark needs to be installed and configured with the
101
- environment variable, ``SPARK_HOME `` set.
93
+ environment variable ``SPARK_HOME `` set.
102
94
103
95
You can test to see if Spark is installed properly by running the following in
104
96
the terminal:
@@ -109,60 +101,46 @@ the terminal:
109
101
/usr/local/bin/spark
110
102
111
103
If the return is a path leading to your Spark folder, then it means that Spark
112
- has been configured correctly.
104
+ has been configured correctly. If ``SPARK_HOME `` is unset or empty, you'll need to add it
105
+ to your ``PATH `` after noting where Spark is installed on your system. For example,
106
+ a MacOS installation of Spark 2.3.0 via HomeBrew would set ``SPARK_HOME `` as follows:
107
+
108
+ .. code :: bash
113
109
114
- How to Install
115
- ^^^^^^^^^^^^^^^
110
+ # In ~/.bash_profile
111
+ export SPARK_HOME=/usr/local/Cellar/apache-spark/2.3.0/libexec/
116
112
117
- Before installing, check the above table to make sure that the
113
+ Installation
114
+ ------------
115
+
116
+ Before installing, check the above `Requirements `_ table to make sure that the
118
117
requirements are met.
119
118
120
119
Installing From Pip
121
- ~~~~~~~~~~~~~~~~~~~~
120
+ ~~~~~~~~~~~~~~~~~~~
122
121
123
122
To install via ``pip `` open the terminal and run the following:
124
123
125
124
.. code :: console
126
125
127
126
pip install geopyspark
128
- geopyspark install-jar -p [path/to/install/jar]
129
-
130
- Where the first command installs the python code from PyPi and the second
131
- downloads the backend, jar file. If no path is given when downloading the jar,
132
- then it will be downloaded to wherever GeoPySpark was installed at.
133
-
134
- What's With That Weird Pip Install?
135
- ====================================
136
-
137
- "What's with that weird pip install?", you may be asking yourself. The reason
138
- for its unusualness is due to how GeoPySpark functions. Because this library
139
- is a python binding for a Scala project, we need to be able to access the
140
- Scala backend. To do this, we plug into PySpark which acts as a bridge between
141
- Python and Scala. However, in order to achieve this the Scala code needs to be
142
- assembled into a jar file. This poses a problem due to its size (117.7 MB at
143
- v0.1.0-RC!). To get around the size constraints of PyPi, we thus utilized this
144
- method of distribution where the jar must be downloaded in a separate command
145
- when using ``pip install ``.
127
+ geopyspark install-jar
146
128
147
- Note:
148
- Installing from source or for development does not require the separate
149
- download of the jar.
129
+ The first command installs the python code and the `geopyspark ` command
130
+ from PyPi. The second downloads the backend jar file, which is too large
131
+ to be included in the pip package, and installs it to the GeoPySpark
132
+ installation directory. For more information about the ``geopyspark ``
133
+ command, see the `GeoPySpark CLI `_ section.
150
134
151
135
Installing From Source
152
- ~~~~~~~~~~~~~~~~~~~~~~~
136
+ ~~~~~~~~~~~~~~~~~~~~~~
153
137
154
138
If you would rather install from source, clone the GeoPySpark repo and enter it.
155
139
156
140
.. code :: console
157
141
158
142
git clone https://github.com/locationtech-labs/geopyspark.git
159
143
cd geopyspark
160
-
161
- Installing For Users
162
- =====================
163
-
164
- .. code :: console
165
-
166
144
make install
167
145
168
146
This will assemble the backend-end ``jar `` that contains the Scala code,
@@ -172,8 +150,68 @@ Note:
172
150
If you have altered the global behavior of ``sbt `` this install may
173
151
not work the way it was intended.
174
152
175
- Installing For Developers
176
- ===========================
153
+ Uninstalling
154
+ ~~~~~~~~~~~~
155
+
156
+ To uninstall GeoPySpark, run the following in the terminal:
157
+
158
+ .. code :: console
159
+
160
+ pip uninstall geopyspark
161
+ rm .local/bin/geopyspark
162
+
163
+ Contact and Support
164
+ -------------------
165
+
166
+ If you need help, have questions, or like to talk to the developers (let us
167
+ know what you're working on!) you can contact us at:
168
+
169
+ * `Gitter <https://gitter.im/geotrellis/geotrellis >`_
170
+ * `Mailing list <https://locationtech.org/mailman/listinfo/geotrellis-user >`_
171
+
172
+ As you may have noticed from the above links, those are links to the GeoTrellis
173
+ gitter channel and mailing list. This is because this project is currently an
174
+ offshoot of GeoTrellis, and we will be using their mailing list and gitter
175
+ channel as a means of contact. However, we will form our own if there is a need
176
+ for it.
177
+
178
+ GeoPySpark CLI
179
+ --------------
180
+
181
+ When GeoPySpark is installed, it comes with a script which can be accessed
182
+ from anywhere on you computer. This script is used to facilitate management
183
+ of the GeoPySpark jar file that must be installed in order for GeoPySpark to
184
+ work correctly. Here are the available commands:
185
+
186
+ .. code :: console
187
+
188
+ geopyspark -h, --help // return help string and exit
189
+ geopyspark install-jar // downloads jar file to default location, which is geopyspark install dir
190
+ geopyspark install-jar -p, --path [download/path] //downloads the jar file to location specified
191
+ geopyspark jar-path //returns the relative path of the jar file
192
+ geopyspark jar-path -a, --absolute //returns the absolute path of the jar file
193
+
194
+ ``geopyspark install-jar `` is only needed when installing GeoPySpark through
195
+ ``pip ``; and it **must ** be ran before using GeoPySpark. If no path is selected,
196
+ then the jar will be installed wherever GeoPySpark was installed.
197
+
198
+ The second and third commands are for getting the location of the jar file.
199
+ These can be used regardless of installation method. However, if installed
200
+ through ``pip ``, then the jar must be downloaded first or these commands
201
+ will not work.
202
+
203
+ Developing GeoPySpark
204
+ ---------------------
205
+
206
+ Contributing
207
+ ~~~~~~~~~~~~
208
+
209
+ Feedback and contributions to GeoPySpark are always welcomed.
210
+ A CLA is required for contribution, see `Contributing <docs/contributing.rst >`_ for more
211
+ information.
212
+
213
+ Installing for Developers
214
+ ~~~~~~~~~~~~~~~~~~~~~~~~~
177
215
178
216
.. code :: console
179
217
@@ -185,41 +223,54 @@ sub-package. The second command will install GeoPySpark in "editable" mode.
185
223
Meaning any changes to the source files will also appear in your system
186
224
installation.
187
225
188
- Installing to a Virtual Environment
189
- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
226
+ Within a virtualenv
227
+ ===================
190
228
191
- A third option is to install GeoPySpark in a virtual environment. To get things
192
- started, enter the environment and run the following:
229
+ It's possible that you may run into issues when performing the ``pip install -e . ``
230
+ described above with a Python virtualenv active. If you're having trouble with
231
+ Python finding installed libraries within the virtualenv, try adding the virtualenv
232
+ site-packages directory to your PYTHONPATH:
193
233
194
234
.. code :: console
195
235
196
- git clone https://github.com/locationtech-labs/geopyspark.git
197
- cd geopyspark
236
+ workon <your-geopyspark-virtualenv-name>
198
237
export PYTHONPATH=$VIRTUAL_ENV/lib/<your python version>/site-packages
199
238
200
239
Replace ``<your python version `` with whatever Python version
201
- ``virtualenvwrapper `` is set to. Installation in a virtual environment can be
202
- a bit weird with GeoPySpark. This is why you need to export the
203
- ``PYTHONPATH `` before installing to ensure that it performs correctly.
240
+ ``virtualenvwrapper `` is set to. Once you've set PYTHONPATH, re-install
241
+ GeoPySpark using the instructions in "Installing for Developers" above.
242
+
243
+ Running GeoPySpark Tests
244
+ ~~~~~~~~~~~~~~~~~~~~~~~~
204
245
205
- Installing For Users
206
- =====================
246
+ GeoPySpark uses the `pytest <https://docs.pytest.org/en/latest/ >`_ testing
247
+ framework to run its unittests. If you wish to run GeoPySpark's unittests,
248
+ then you must first clone this repository to your machine. Once complete,
249
+ go to the root of the library and run the following command:
207
250
208
251
.. code :: console
209
252
210
- make virtual-install
253
+ pytest
211
254
212
- Installing For Developers
213
- ===========================
255
+ This will then run all of the tests present in the GeoPySpark library.
214
256
215
- .. code :: console
257
+ **Note **: The unittests require additional dependencies in order to pass fully.
258
+ `pyproj <https://pypi.python.org/pypi/pyproj? >`_, `colortools <https://pypi.python.org/pypi/colortools/0.1.2 >`_,
259
+ and `matplotlib <https://pypi.python.org/pypi/matplotlib/2.0.2 >`_ (only for >=Python3.4) are needed to
260
+ ensure that all of the tests pass.
216
261
217
- make build
218
- pip install -e .
262
+ Make Targets
263
+ ============
219
264
265
+ - **install ** - install GeoPySpark python package locally
266
+ - **wheel ** - build python GeoPySpark wheel for distribution
267
+ - **pyspark ** - start pyspark shell with project jars
268
+ - **build ** - builds the backend jar and moves it to the jars sub-package
269
+ - **clean ** - remove the wheel, the backend jar file, and clean the
270
+ geotrellis-backend directory
220
271
221
272
Developing GeoPySpark With GeoNotebook
222
- ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
273
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
223
274
224
275
**Note **: Before begining this section, it should be noted that python-mapnik,
225
276
a dependency for GeoNotebook, has been found to be difficult to install. If
@@ -278,7 +329,7 @@ GeoNotebook/GeoTrellis integration in currently in active development and not pa
278
329
The latest development is on a ``feature/geotrellis `` branch at ``<https://github.com/geotrellis/geonotebook> ``.
279
330
280
331
Side Note For Developers
281
- ~~~~~~~~~~~~~~~~~~~~~~~~~
332
+ ========================
282
333
283
334
An optional (but recommended!) step for developers is to place these
284
335
two lines of code at the top of your notebooks.
@@ -296,72 +347,3 @@ read `here <http://ipython.readthedocs.io/en/stable/config/extensions/autoreload
296
347
Using ``pip install -e `` in conjunction with ``autoreload `` should cover any
297
348
changes made, though, and will make the development experience much less
298
349
painful.
299
-
300
- GeoPySpark Script
301
- -----------------
302
-
303
- When GeoPySpark is installed, it comes with a script which can be accessed
304
- from anywhere on you computer. These are the commands that can be ran via the
305
- script:
306
-
307
- .. code :: console
308
-
309
- geopyspark install-jar -p, --path [download/path] //downloads the jar file
310
- geopyspark jar-path //returns the relative path of the jar file
311
- geopyspark jar-path -a, --absolute //returns the absolute path of the jar file
312
-
313
- The first command is only needed when installing GeoPySpark through ``pip ``;
314
- and it **must ** be ran before using GeoPySpark. If no path is selected, then
315
- the jar will be installed wherever GeoPySpark was installed.
316
-
317
- The second and third commands are for getting the location of the jar file.
318
- These can be used regardless of installation method. However, if installed
319
- through ``pip ``, then the jar must be downloaded first or these commands
320
- will not work.
321
-
322
-
323
- Running GeoPySpark Tests
324
- -------------------------
325
-
326
- GeoPySpark uses the `pytest <https://docs.pytest.org/en/latest/ >`_ testing
327
- framework to run its unittests. If you wish to run GeoPySpark's unittests,
328
- then you must first clone this repository to your machine. Once complete,
329
- go to the root of the library and run the following command:
330
-
331
- .. code :: console
332
-
333
- pytest
334
-
335
- This will then run all of the tests present in the GeoPySpark library.
336
-
337
- **Note **: The unittests require additional dependencies in order to pass fully.
338
- `pyrproj <https://pypi.python.org/pypi/pyproj? >`_, `colortools <https://pypi.python.org/pypi/colortools/0.1.2 >`_,
339
- and `matplotlib <https://pypi.python.org/pypi/matplotlib/2.0.2 >`_ (only for >=Python3.4) are needed to
340
- ensure that all of the tests pass.
341
-
342
- Make Targets
343
- ^^^^^^^^^^^^
344
-
345
- - **install ** - install GeoPySpark python package locally
346
- - **wheel ** - build python GeoPySpark wheel for distribution
347
- - **pyspark ** - start pyspark shell with project jars
348
- - **build ** - builds the backend jar and moves it to the jars sub-package
349
- - **clean ** - remove the wheel, the backend jar file, and clean the
350
- geotrellis-backend directory
351
-
352
- Uninstalling
353
- ------------
354
-
355
- To uninstall GeoPySpark, run the following in the terminal:
356
-
357
- .. code :: console
358
-
359
- pip uninstall geopyspark
360
- rm .local/bin/geopyspark
361
-
362
- Contributing
363
- ------------
364
-
365
- Any kind of feedback and contributions to GeoPySpark is always welcomed.
366
- A CLA is required for contribution, see `Contributing <docs/contributing.rst >`_ for more
367
- information.
0 commit comments