Skip to content

Commit bc63e03

Browse files
authored
Merge pull request #657 from CloudNiner/feature/awf/documentation-getting-started-improvements
README documentation cleanup
2 parents cc2e491 + ed349ba commit bc63e03

File tree

3 files changed

+137
-206
lines changed

3 files changed

+137
-206
lines changed

README.rst

Lines changed: 130 additions & 148 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,28 @@
11
GeoPySpark
2-
***********
2+
**********
3+
34
.. image:: https://travis-ci.org/locationtech-labs/geopyspark.svg?branch=master
45
:target: https://travis-ci.org/locationtech-labs/geopyspark
56

67
.. image:: https://readthedocs.org/projects/geopyspark/badge/?version=latest
78
:target: https://geopyspark.readthedocs.io/en/latest/?badge=latest
89

10+
.. image:: https://badges.gitter.im/locationtech-labs/geopyspark.png
11+
:target: https://gitter.im/geotrellis/geotrellis
912

1013
GeoPySpark is a Python bindings library for `GeoTrellis <http://geotrellis.io>`_, a Scala
1114
library for working with geospatial data in a distributed environment.
1215
By using `PySpark <http://spark.apache.org/docs/latest/api/python/pyspark.html>`_, GeoPySpark is
13-
able to provide na interface into the GeoTrellis framework.
16+
able to provide an interface into the GeoTrellis framework.
17+
18+
Links
19+
-----
20+
21+
* `Documentation <https://geopyspark.readthedocs.io>`_
22+
* `Gitter <https://gitter.im/geotrellis/geotrellis>`_
1423

1524
A Quick Example
16-
----------------
25+
---------------
1726

1827
Here is a quick example of GeoPySpark. In the following code, we take NLCD data
1928
of the state of Pennsylvania from 2011, and do a masking operation on it with
@@ -65,27 +74,10 @@ for you:
6574
layer_name='north-west-philly',
6675
tiled_raster_layer=pyramid)
6776
77+
For additional examples, check out the `Jupyter notebook demos <./notebook-demos>`_.
6878

69-
Contact and Support
70-
--------------------
71-
72-
If you need help, have questions, or like to talk to the developers (let us
73-
know what you're working on!) you contact us at:
74-
75-
* `Gitter <https://gitter.im/geotrellis/geotrellis>`_
76-
* `Mailing list <https://locationtech.org/mailman/listinfo/geotrellis-user>`_
77-
78-
As you may have noticed from the above links, those are links to the GeoTrellis
79-
gitter channel and mailing list. This is because this project is currently an
80-
offshoot of GeoTrellis, and we will be using their mailing list and gitter
81-
channel as a means of contact. However, we will form our own if there is a need
82-
for it.
83-
84-
Setup
85-
------
86-
87-
GeoPySpark Requirements
88-
^^^^^^^^^^^^^^^^^^^^^^^^
79+
Requirements
80+
------------
8981

9082
============ ============
9183
Requirement Version
@@ -96,9 +88,9 @@ Python 3.3 - 3.6
9688
Spark >=2.1.1
9789
============ ============
9890

99-
Java 8 and Scala 2.11 are needed for GeoPySpark to work; as they are required by
91+
Java 8 and Scala 2.11 are needed for GeoPySpark to work, as they are required by
10092
GeoTrellis. In addition, Spark needs to be installed and configured with the
101-
environment variable, ``SPARK_HOME`` set.
93+
environment variable ``SPARK_HOME`` set.
10294

10395
You can test to see if Spark is installed properly by running the following in
10496
the terminal:
@@ -109,60 +101,46 @@ the terminal:
109101
/usr/local/bin/spark
110102
111103
If the return is a path leading to your Spark folder, then it means that Spark
112-
has been configured correctly.
104+
has been configured correctly. If ``SPARK_HOME`` is unset or empty, you'll need to add it
105+
to your ``PATH`` after noting where Spark is installed on your system. For example,
106+
a MacOS installation of Spark 2.3.0 via HomeBrew would set ``SPARK_HOME`` as follows:
107+
108+
.. code:: bash
113109
114-
How to Install
115-
^^^^^^^^^^^^^^^
110+
# In ~/.bash_profile
111+
export SPARK_HOME=/usr/local/Cellar/apache-spark/2.3.0/libexec/
116112
117-
Before installing, check the above table to make sure that the
113+
Installation
114+
------------
115+
116+
Before installing, check the above `Requirements`_ table to make sure that the
118117
requirements are met.
119118

120119
Installing From Pip
121-
~~~~~~~~~~~~~~~~~~~~
120+
~~~~~~~~~~~~~~~~~~~
122121

123122
To install via ``pip`` open the terminal and run the following:
124123

125124
.. code:: console
126125
127126
pip install geopyspark
128-
geopyspark install-jar -p [path/to/install/jar]
129-
130-
Where the first command installs the python code from PyPi and the second
131-
downloads the backend, jar file. If no path is given when downloading the jar,
132-
then it will be downloaded to wherever GeoPySpark was installed at.
133-
134-
What's With That Weird Pip Install?
135-
====================================
136-
137-
"What's with that weird pip install?", you may be asking yourself. The reason
138-
for its unusualness is due to how GeoPySpark functions. Because this library
139-
is a python binding for a Scala project, we need to be able to access the
140-
Scala backend. To do this, we plug into PySpark which acts as a bridge between
141-
Python and Scala. However, in order to achieve this the Scala code needs to be
142-
assembled into a jar file. This poses a problem due to its size (117.7 MB at
143-
v0.1.0-RC!). To get around the size constraints of PyPi, we thus utilized this
144-
method of distribution where the jar must be downloaded in a separate command
145-
when using ``pip install``.
127+
geopyspark install-jar
146128
147-
Note:
148-
Installing from source or for development does not require the separate
149-
download of the jar.
129+
The first command installs the python code and the `geopyspark` command
130+
from PyPi. The second downloads the backend jar file, which is too large
131+
to be included in the pip package, and installs it to the GeoPySpark
132+
installation directory. For more information about the ``geopyspark``
133+
command, see the `GeoPySpark CLI`_ section.
150134

151135
Installing From Source
152-
~~~~~~~~~~~~~~~~~~~~~~~
136+
~~~~~~~~~~~~~~~~~~~~~~
153137

154138
If you would rather install from source, clone the GeoPySpark repo and enter it.
155139

156140
.. code:: console
157141
158142
git clone https://github.com/locationtech-labs/geopyspark.git
159143
cd geopyspark
160-
161-
Installing For Users
162-
=====================
163-
164-
.. code:: console
165-
166144
make install
167145
168146
This will assemble the backend-end ``jar`` that contains the Scala code,
@@ -172,8 +150,68 @@ Note:
172150
If you have altered the global behavior of ``sbt`` this install may
173151
not work the way it was intended.
174152

175-
Installing For Developers
176-
===========================
153+
Uninstalling
154+
~~~~~~~~~~~~
155+
156+
To uninstall GeoPySpark, run the following in the terminal:
157+
158+
.. code:: console
159+
160+
pip uninstall geopyspark
161+
rm .local/bin/geopyspark
162+
163+
Contact and Support
164+
-------------------
165+
166+
If you need help, have questions, or like to talk to the developers (let us
167+
know what you're working on!) you can contact us at:
168+
169+
* `Gitter <https://gitter.im/geotrellis/geotrellis>`_
170+
* `Mailing list <https://locationtech.org/mailman/listinfo/geotrellis-user>`_
171+
172+
As you may have noticed from the above links, those are links to the GeoTrellis
173+
gitter channel and mailing list. This is because this project is currently an
174+
offshoot of GeoTrellis, and we will be using their mailing list and gitter
175+
channel as a means of contact. However, we will form our own if there is a need
176+
for it.
177+
178+
GeoPySpark CLI
179+
--------------
180+
181+
When GeoPySpark is installed, it comes with a script which can be accessed
182+
from anywhere on you computer. This script is used to facilitate management
183+
of the GeoPySpark jar file that must be installed in order for GeoPySpark to
184+
work correctly. Here are the available commands:
185+
186+
.. code:: console
187+
188+
geopyspark -h, --help // return help string and exit
189+
geopyspark install-jar // downloads jar file to default location, which is geopyspark install dir
190+
geopyspark install-jar -p, --path [download/path] //downloads the jar file to location specified
191+
geopyspark jar-path //returns the relative path of the jar file
192+
geopyspark jar-path -a, --absolute //returns the absolute path of the jar file
193+
194+
``geopyspark install-jar`` is only needed when installing GeoPySpark through
195+
``pip``; and it **must** be ran before using GeoPySpark. If no path is selected,
196+
then the jar will be installed wherever GeoPySpark was installed.
197+
198+
The second and third commands are for getting the location of the jar file.
199+
These can be used regardless of installation method. However, if installed
200+
through ``pip``, then the jar must be downloaded first or these commands
201+
will not work.
202+
203+
Developing GeoPySpark
204+
---------------------
205+
206+
Contributing
207+
~~~~~~~~~~~~
208+
209+
Feedback and contributions to GeoPySpark are always welcomed.
210+
A CLA is required for contribution, see `Contributing <docs/contributing.rst>`_ for more
211+
information.
212+
213+
Installing for Developers
214+
~~~~~~~~~~~~~~~~~~~~~~~~~
177215

178216
.. code:: console
179217
@@ -185,41 +223,54 @@ sub-package. The second command will install GeoPySpark in "editable" mode.
185223
Meaning any changes to the source files will also appear in your system
186224
installation.
187225

188-
Installing to a Virtual Environment
189-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
226+
Within a virtualenv
227+
===================
190228

191-
A third option is to install GeoPySpark in a virtual environment. To get things
192-
started, enter the environment and run the following:
229+
It's possible that you may run into issues when performing the ``pip install -e .``
230+
described above with a Python virtualenv active. If you're having trouble with
231+
Python finding installed libraries within the virtualenv, try adding the virtualenv
232+
site-packages directory to your PYTHONPATH:
193233

194234
.. code:: console
195235
196-
git clone https://github.com/locationtech-labs/geopyspark.git
197-
cd geopyspark
236+
workon <your-geopyspark-virtualenv-name>
198237
export PYTHONPATH=$VIRTUAL_ENV/lib/<your python version>/site-packages
199238
200239
Replace ``<your python version`` with whatever Python version
201-
``virtualenvwrapper`` is set to. Installation in a virtual environment can be
202-
a bit weird with GeoPySpark. This is why you need to export the
203-
``PYTHONPATH`` before installing to ensure that it performs correctly.
240+
``virtualenvwrapper`` is set to. Once you've set PYTHONPATH, re-install
241+
GeoPySpark using the instructions in "Installing for Developers" above.
242+
243+
Running GeoPySpark Tests
244+
~~~~~~~~~~~~~~~~~~~~~~~~
204245

205-
Installing For Users
206-
=====================
246+
GeoPySpark uses the `pytest <https://docs.pytest.org/en/latest/>`_ testing
247+
framework to run its unittests. If you wish to run GeoPySpark's unittests,
248+
then you must first clone this repository to your machine. Once complete,
249+
go to the root of the library and run the following command:
207250

208251
.. code:: console
209252
210-
make virtual-install
253+
pytest
211254
212-
Installing For Developers
213-
===========================
255+
This will then run all of the tests present in the GeoPySpark library.
214256

215-
.. code:: console
257+
**Note**: The unittests require additional dependencies in order to pass fully.
258+
`pyproj <https://pypi.python.org/pypi/pyproj?>`_, `colortools <https://pypi.python.org/pypi/colortools/0.1.2>`_,
259+
and `matplotlib <https://pypi.python.org/pypi/matplotlib/2.0.2>`_ (only for >=Python3.4) are needed to
260+
ensure that all of the tests pass.
216261

217-
make build
218-
pip install -e .
262+
Make Targets
263+
============
219264

265+
- **install** - install GeoPySpark python package locally
266+
- **wheel** - build python GeoPySpark wheel for distribution
267+
- **pyspark** - start pyspark shell with project jars
268+
- **build** - builds the backend jar and moves it to the jars sub-package
269+
- **clean** - remove the wheel, the backend jar file, and clean the
270+
geotrellis-backend directory
220271

221272
Developing GeoPySpark With GeoNotebook
222-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
273+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
223274

224275
**Note**: Before begining this section, it should be noted that python-mapnik,
225276
a dependency for GeoNotebook, has been found to be difficult to install. If
@@ -278,7 +329,7 @@ GeoNotebook/GeoTrellis integration in currently in active development and not pa
278329
The latest development is on a ``feature/geotrellis`` branch at ``<https://github.com/geotrellis/geonotebook>``.
279330

280331
Side Note For Developers
281-
~~~~~~~~~~~~~~~~~~~~~~~~~
332+
========================
282333

283334
An optional (but recommended!) step for developers is to place these
284335
two lines of code at the top of your notebooks.
@@ -296,72 +347,3 @@ read `here <http://ipython.readthedocs.io/en/stable/config/extensions/autoreload
296347
Using ``pip install -e`` in conjunction with ``autoreload`` should cover any
297348
changes made, though, and will make the development experience much less
298349
painful.
299-
300-
GeoPySpark Script
301-
-----------------
302-
303-
When GeoPySpark is installed, it comes with a script which can be accessed
304-
from anywhere on you computer. These are the commands that can be ran via the
305-
script:
306-
307-
.. code:: console
308-
309-
geopyspark install-jar -p, --path [download/path] //downloads the jar file
310-
geopyspark jar-path //returns the relative path of the jar file
311-
geopyspark jar-path -a, --absolute //returns the absolute path of the jar file
312-
313-
The first command is only needed when installing GeoPySpark through ``pip``;
314-
and it **must** be ran before using GeoPySpark. If no path is selected, then
315-
the jar will be installed wherever GeoPySpark was installed.
316-
317-
The second and third commands are for getting the location of the jar file.
318-
These can be used regardless of installation method. However, if installed
319-
through ``pip``, then the jar must be downloaded first or these commands
320-
will not work.
321-
322-
323-
Running GeoPySpark Tests
324-
-------------------------
325-
326-
GeoPySpark uses the `pytest <https://docs.pytest.org/en/latest/>`_ testing
327-
framework to run its unittests. If you wish to run GeoPySpark's unittests,
328-
then you must first clone this repository to your machine. Once complete,
329-
go to the root of the library and run the following command:
330-
331-
.. code:: console
332-
333-
pytest
334-
335-
This will then run all of the tests present in the GeoPySpark library.
336-
337-
**Note**: The unittests require additional dependencies in order to pass fully.
338-
`pyrproj <https://pypi.python.org/pypi/pyproj?>`_, `colortools <https://pypi.python.org/pypi/colortools/0.1.2>`_,
339-
and `matplotlib <https://pypi.python.org/pypi/matplotlib/2.0.2>`_ (only for >=Python3.4) are needed to
340-
ensure that all of the tests pass.
341-
342-
Make Targets
343-
^^^^^^^^^^^^
344-
345-
- **install** - install GeoPySpark python package locally
346-
- **wheel** - build python GeoPySpark wheel for distribution
347-
- **pyspark** - start pyspark shell with project jars
348-
- **build** - builds the backend jar and moves it to the jars sub-package
349-
- **clean** - remove the wheel, the backend jar file, and clean the
350-
geotrellis-backend directory
351-
352-
Uninstalling
353-
------------
354-
355-
To uninstall GeoPySpark, run the following in the terminal:
356-
357-
.. code:: console
358-
359-
pip uninstall geopyspark
360-
rm .local/bin/geopyspark
361-
362-
Contributing
363-
------------
364-
365-
Any kind of feedback and contributions to GeoPySpark is always welcomed.
366-
A CLA is required for contribution, see `Contributing <docs/contributing.rst>`_ for more
367-
information.

docs/contributing.rst

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,13 @@ features or other patches this page gives you more info on how to do it.
1717
Building GeoPySpark
1818
-------------------
1919

20-
1. Install and setup Hadoop (the master branch is currently built with 2.0.1).
21-
2. Check out `this <https://github.com/locationtech-labs/geopyspark>`__. repository.
22-
3. Pick the branch corresponding to the version you are targeting
23-
4. Run ``make install`` to build GeoPySpark.
20+
Ensure you have the
21+
`project dependencies<https://github.com/locationtech-labs/geopyspark/blob/master/README.rst#requirements>`_
22+
installed on your machine.
23+
24+
Then follow the
25+
`Installing for Developers<https://github.com/locationtech-labs/geopyspark/blob/master/README.rst#installing-for-developers>`_
26+
instructions in the project README.
2427

2528
Style Guide
2629
-----------

0 commit comments

Comments
 (0)