Skip to content

Commit 4211763

Browse files
authored
Merge pull request #290 from JohnSnowLabs/172-release-candidate
Release candidate 1.7.2
2 parents d3b9efa + b457d5a commit 4211763

File tree

6 files changed

+53
-31
lines changed

6 files changed

+53
-31
lines changed

CHANGELOG

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,25 @@
1+
========
2+
1.7.2
3+
========
4+
---------------
5+
Overview
6+
---------------
7+
Quick release with another hotfix, due to a new found bug when deserializing word embeddings in a distributed fs. Also introduces changes in application.conf reader in order
8+
to allow run-time changes. Also introduces renaming from EmbeddingsHelper API.
9+
10+
---------------
11+
Bugfixes
12+
---------------
13+
* Fixed embeddings deserialization from distributed filesystem (caused due to windows pathfix)
14+
* Fixed application.conf not reading changes in runtime
15+
* Added missing remote_locs argument in python pretrained() functions
16+
* Fixed wrong build version introduced in 1.7.1 to detect proper pretrained models version
17+
18+
---------------
19+
Developer API
20+
---------------
21+
* Renamed EmbeddingsHelper functions for more convenience
22+
123
========
224
1.7.1
325
========

README.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -14,18 +14,18 @@ Questions? Feedback? Request access sending an email to [email protected]
1414

1515
This library has been uploaded to the spark-packages repository https://spark-packages.org/package/JohnSnowLabs/spark-nlp .
1616

17-
To use the most recent version just add the `--packages JohnSnowLabs:spark-nlp:1.7.1` to you spark command
17+
To use the most recent version just add the `--packages JohnSnowLabs:spark-nlp:1.7.2` to you spark command
1818

1919
```sh
20-
spark-shell --packages JohnSnowLabs:spark-nlp:1.7.1
20+
spark-shell --packages JohnSnowLabs:spark-nlp:1.7.2
2121
```
2222

2323
```sh
24-
pyspark --packages JohnSnowLabs:spark-nlp:1.7.1
24+
pyspark --packages JohnSnowLabs:spark-nlp:1.7.2
2525
```
2626

2727
```sh
28-
spark-submit --packages JohnSnowLabs:spark-nlp:1.7.1
28+
spark-submit --packages JohnSnowLabs:spark-nlp:1.7.2
2929
```
3030

3131
## Jupyter Notebook
@@ -35,23 +35,23 @@ export SPARK_HOME=/path/to/your/spark/folder
3535
export PYSPARK_DRIVER_PYTHON=jupyter
3636
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
3737
38-
pyspark --packages JohnSnowLabs:spark-nlp:1.7.1
38+
pyspark --packages JohnSnowLabs:spark-nlp:1.7.2
3939
```
4040

4141
## Apache Zeppelin
4242
This way will work for both Scala and Python
4343
```
44-
export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:1.7.1"
44+
export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:1.7.2"
4545
```
4646
Alternatively, add the following Maven Coordinates to the interpreter's library list
4747
```
48-
com.johnsnowlabs.nlp:spark-nlp_2.11:1.7.1
48+
com.johnsnowlabs.nlp:spark-nlp_2.11:1.7.2
4949
```
5050

5151
## Python without explicit Spark installation
5252
If you installed pyspark through pip, you can now install sparknlp through pip
5353
```
54-
pip install spark-nlp==1.7.1
54+
pip install spark-nlp==1.7.2
5555
```
5656
Then you'll have to create a SparkSession manually, for example:
5757
```
@@ -84,11 +84,11 @@ sparknlp {
8484

8585
## Pre-compiled Spark-NLP and Spark-NLP-OCR
8686
You may download fat-jar from here:
87-
[Spark-NLP 1.7.1 FAT-JAR](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-assembly-1.7.1.jar)
87+
[Spark-NLP 1.7.2 FAT-JAR](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-assembly-1.7.2.jar)
8888
or non-fat from here
89-
[Spark-NLP 1.7.1 PKG JAR](http://repo1.maven.org/maven2/com/johnsnowlabs/nlp/spark-nlp_2.11/1.7.1/spark-nlp_2.11-1.7.1.jar)
89+
[Spark-NLP 1.7.2 PKG JAR](http://repo1.maven.org/maven2/com/johnsnowlabs/nlp/spark-nlp_2.11/1.7.2/spark-nlp_2.11-1.7.2.jar)
9090
Spark-NLP-OCR Module (Requires native Tesseract 4.x+ for image based OCR. Does not require Spark-NLP to work but highly suggested)
91-
[Spark-NLP-OCR 1.7.1 FAT-JAR](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-ocr-assembly-1.7.1.jar)
91+
[Spark-NLP-OCR 1.7.2 FAT-JAR](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-ocr-assembly-1.7.2.jar)
9292

9393
## Maven central
9494

@@ -100,19 +100,19 @@ Our package is deployed to maven central. In order to add this package as a depe
100100
<dependency>
101101
<groupId>com.johnsnowlabs.nlp</groupId>
102102
<artifactId>spark-nlp_2.11</artifactId>
103-
<version>1.7.1</version>
103+
<version>1.7.2</version>
104104
</dependency>
105105
```
106106

107107
#### SBT
108108
```sbtshell
109-
libraryDependencies += "com.johnsnowlabs.nlp" % "spark-nlp_2.11" % "1.7.1"
109+
libraryDependencies += "com.johnsnowlabs.nlp" % "spark-nlp_2.11" % "1.7.2"
110110
```
111111

112112
If you are using `scala 2.11`
113113

114114
```sbtshell
115-
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "1.7.1"
115+
libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "1.7.2"
116116
```
117117

118118
## Using the jar manually
@@ -133,7 +133,7 @@ The preferred way to use the library when running spark programs is using the `-
133133

134134
If you have troubles using pretrained() models in your environment, here a list to various models (only valid for latest versions).
135135
If there is any older than current version of a model, it means they still work for current versions.
136-
### Updated for 1.7.1
136+
### Updated for 1.7.2
137137
### Pipelines
138138
* [Basic Pipeline](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pipeline_basic_en_1.6.1_2_1533856444797.zip)
139139
* [Advanced Pipeline](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pipeline_advanced_en_1.7.0_2_1539460910585.zip)

build.sbt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ name := "spark-nlp"
99

1010
organization := "com.johnsnowlabs.nlp"
1111

12-
version := "1.7.1"
12+
version := "1.7.2"
1313

1414
scalaVersion in ThisBuild := scalaVer
1515

@@ -138,7 +138,7 @@ assemblyMergeStrategy in assembly := {
138138
lazy val ocr = (project in file("ocr"))
139139
.settings(
140140
name := "spark-nlp-ocr",
141-
version := "1.7.1",
141+
version := "1.7.2",
142142
libraryDependencies ++= ocrDependencies ++
143143
analyticsDependencies ++
144144
testDependencies,

docs/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ <h2 class="title">High Performance NLP with Apache Spark </h2>
7878
</p>
7979
<a class="btn btn-info btn-cta" style="float: center;margin-top: 10px;" href="mailto:[email protected]?subject=SparkNLP%20Slack%20access" target="_blank"> Questions? Join our Slack</a>
8080
<b/><p/><p/>
81-
<p><span class="label label-warning">2018 Oct 19th - Update!</span> 1.7.1 Released! Word embeddings decoupled from annotators and better Windows support</p>
81+
<p><span class="label label-warning">2018 Oct 19th - Update!</span> 1.7.2 Released! Word embeddings decoupled from annotators and better Windows support</p>
8282
</div>
8383
<div id="cards-wrapper" class="cards-wrapper row">
8484
<div class="item item-green col-md-4 col-sm-6 col-xs-6">

docs/quickstart.html

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -95,35 +95,35 @@ <h2 class="section-title">Requirements & Setup</h2>
9595
To start using the library, execute any of the following lines
9696
depending on your desired use case:
9797
</p>
98-
<pre><code class="language-javascript">spark-shell --packages JohnSnowLabs:spark-nlp:1.7.1
99-
pyspark --packages JohnSnowLabs:spark-nlp:1.7.1
100-
spark-submit --packages JohnSnowLabs:spark-nlp:1.7.1
98+
<pre><code class="language-javascript">spark-shell --packages JohnSnowLabs:spark-nlp:1.7.2
99+
pyspark --packages JohnSnowLabs:spark-nlp:1.7.2
100+
spark-submit --packages JohnSnowLabs:spark-nlp:1.7.2
101101
</code></pre>
102102
<div><b>NOTE: </b>Spark packages --packages has been reported to work unproperly, particularly in python, when utilizing physical clusters.
103103
Utilizing --jars is advised. For python, add python Spark-NLP through pip</div>
104104
<p/>
105105
<h3><b>Databricks cloud cluster</b> & <b>Apache Zeppelin</b></h3>
106-
<pre><code class="language-javascript">com.johnsnowlabs.nlp:spark-nlp_2.11:1.7.1</code></pre>
106+
<pre><code class="language-javascript">com.johnsnowlabs.nlp:spark-nlp_2.11:1.7.2</code></pre>
107107
<p>
108108
For Python in <b>Apache Zeppelin</b> you may need to setup <i><b>SPARK_SUBMIT_OPTIONS</b></i> utilizing --packages instruction shown above like this
109109
</p>
110-
<pre><code class="language-javascript">export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:1.7.1"</code></pre>
110+
<pre><code class="language-javascript">export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:1.7.2"</code></pre>
111111
<h3><b>Python Jupyter Notebook with PySpark</b></h3>
112112
<pre><code class="language-javascript">export SPARK_HOME=/path/to/your/spark/folder
113113
export PYSPARK_DRIVER_PYTHON=jupyter
114114
export PYSPARK_DRIVER_PYTHON_OPTS=notebook
115115

116-
pyspark --packages JohnSnowLabs:spark-nlp:1.7.1</code></pre>
116+
pyspark --packages JohnSnowLabs:spark-nlp:1.7.2</code></pre>
117117
<h3><b>Python without explicit Spark Installation</b></h3>
118118
<p>Use pip to install (after you pip installed pyspark)</p>
119-
<pre><code class="language-javascript">pip install spark-nlp==1.7.1</code></pre>
119+
<pre><code class="language-javascript">pip install spark-nlp==1.7.2</code></pre>
120120
<p>In this way, you will have to start SparkSession in your python program manually, this is an example</p>
121121
<pre><code class="python">spark = SparkSession.builder \
122122
.appName("ner")\
123123
.master("local[*]")\
124124
.config("spark.driver.memory","4G")\
125125
.config("spark.driver.maxResultSize", "2G") \
126-
.config("spark.driver.extraClassPath", "lib/spark-nlp-assembly-1.7.1.jar")\
126+
.config("spark.driver.extraClassPath", "lib/spark-nlp-assembly-1.7.2.jar")\
127127
.config("spark.kryoserializer.buffer.max", "500m")\
128128
.getOrCreate()</code></pre>
129129
<h3>S3 based standalone cluster (No Hadoop)</h3>
@@ -145,11 +145,11 @@ <h3>S3 based standalone cluster (No Hadoop)</h3>
145145
<h3>Pre-Compiled Spark-NLP for download</h3>
146146
<p>
147147
Pre-compiled Spark-NLP assembly fat-jar for using in standalone projects, may be downloaded
148-
<a href="https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-assembly-1.7.1.jar">here</a>
148+
<a href="https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-assembly-1.7.2.jar">here</a>
149149
Non-fat-jar may be downloaded
150-
<a href="http://repo1.maven.org/maven2/com/johnsnowlabs/nlp/spark-nlp_2.11/1.7.1/spark-nlp_2.11-1.7.1.jar">here</a>
150+
<a href="http://repo1.maven.org/maven2/com/johnsnowlabs/nlp/spark-nlp_2.11/1.7.2/spark-nlp_2.11-1.7.2.jar">here</a>
151151
then, run spark-shell or spark-submit with appropriate <b>--jars
152-
/path/to/spark-nlp_2.11-1.7.1.jar</b> to use the library in spark.
152+
/path/to/spark-nlp_2.11-1.7.2.jar</b> to use the library in spark.
153153
</p>
154154
<p>
155155
For further alternatives and documentation check out our README page in <a href="https://github.com/JohnSnowLabs/spark-nlp">GitHub</a>.
@@ -435,7 +435,7 @@ <h2 class="section-title">Utilizing Spark-NLP OCR PDF Converter</h2>
435435
<h3 class="block-title">Installing Spark-NLP OCRHelper</h3>
436436
<p>
437437
First, either build from source or download the following standalone jar module (works both from Spark-NLP python and scala):
438-
<a href="https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-ocr-assembly-1.7.1.jar">Spark-NLP-OCR</a>
438+
<a href="https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-ocr-assembly-1.7.2.jar">Spark-NLP-OCR</a>
439439
And add it to your Spark environment (with --jars or spark.driver.extraClassPath and spark.executor.extraClassPath configuration)
440440
Second, if your PDFs don't have a text layer (this depends on how PDFs were created), the library will use Tesseract 4.0 on background.
441441
Tesseract will utilize native libraries, so you'll have to get them installed in your system.

src/main/scala/com/johnsnowlabs/util/Build.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,6 @@ object Build {
1111
if (version != null && version.nonEmpty)
1212
version
1313
else
14-
"1.7.0"
14+
"1.7.2"
1515
}
1616
}

0 commit comments

Comments
 (0)