Merge pull request #290 from JohnSnowLabs/172-release-candidate

saif-ellafi · web-flow · commit 4211763aaa69 · 2018-10-20T20:12:56.000-03:00
Release candidate 1.7.2
diff --git a/CHANGELOG b/CHANGELOG
@@ -1,3 +1,25 @@
+========
+1.7.2
+========
+---------------
+Overview
+---------------
+Quick release with another hotfix, due to a new found bug when deserializing word embeddings in a distributed fs. Also introduces changes in application.conf reader in order
+to allow run-time changes. Also introduces renaming from EmbeddingsHelper API.
+
+---------------
+Bugfixes
+---------------
+* Fixed embeddings deserialization from distributed filesystem (caused due to windows pathfix)
+* Fixed application.conf not reading changes in runtime
+* Added missing remote_locs argument in python pretrained() functions
+* Fixed wrong build version introduced in 1.7.1 to detect proper pretrained models version
+
+---------------
+Developer API
+---------------
+* Renamed EmbeddingsHelper functions for more convenience
+
 ========
 1.7.1
 ========
diff --git a/README.md b/README.md
@@ -14,18 +14,18 @@ Questions? Feedback? Request access sending an email to nlp@johnsnowlabs.com
 
 This library has been uploaded to the spark-packages repository https://spark-packages.org/package/JohnSnowLabs/spark-nlp .
 
-To use the most recent version just add the `--packages JohnSnowLabs:spark-nlp:1.7.1` to you spark command
+To use the most recent version just add the `--packages JohnSnowLabs:spark-nlp:1.7.2` to you spark command
 
 ```sh
-spark-shell --packages JohnSnowLabs:spark-nlp:1.7.1
+spark-shell --packages JohnSnowLabs:spark-nlp:1.7.2
 ```
 
 ```sh
-pyspark --packages JohnSnowLabs:spark-nlp:1.7.1
+pyspark --packages JohnSnowLabs:spark-nlp:1.7.2
 ```
 
 ```sh
-spark-submit --packages JohnSnowLabs:spark-nlp:1.7.1
+spark-submit --packages JohnSnowLabs:spark-nlp:1.7.2
 ```
 
 ## Jupyter Notebook
@@ -35,23 +35,23 @@ export SPARK_HOME=/path/to/your/spark/folder
 export PYSPARK_DRIVER_PYTHON=jupyter
 export PYSPARK_DRIVER_PYTHON_OPTS=notebook
 
-pyspark --packages JohnSnowLabs:spark-nlp:1.7.1
+pyspark --packages JohnSnowLabs:spark-nlp:1.7.2
 ```
 
 ## Apache Zeppelin
 This way will work for both Scala and Python
 ```
-export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:1.7.1"
+export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:1.7.2"
 ```
 Alternatively, add the following Maven Coordinates to the interpreter's library list
 ```
-com.johnsnowlabs.nlp:spark-nlp_2.11:1.7.1
+com.johnsnowlabs.nlp:spark-nlp_2.11:1.7.2
 ```
 
 ## Python without explicit Spark installation
 If you installed pyspark through pip, you can now install sparknlp through pip
 ```
-pip install spark-nlp==1.7.1
+pip install spark-nlp==1.7.2
 ```
 Then you'll have to create a SparkSession manually, for example:
 ```
@@ -84,11 +84,11 @@ sparknlp {
 
 ## Pre-compiled Spark-NLP and Spark-NLP-OCR
 You may download fat-jar from here:
-[Spark-NLP 1.7.1 FAT-JAR](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-assembly-1.7.1.jar)
+[Spark-NLP 1.7.2 FAT-JAR](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-assembly-1.7.2.jar)
 or non-fat from here
-[Spark-NLP 1.7.1 PKG JAR](http://repo1.maven.org/maven2/com/johnsnowlabs/nlp/spark-nlp_2.11/1.7.1/spark-nlp_2.11-1.7.1.jar)
+[Spark-NLP 1.7.2 PKG JAR](http://repo1.maven.org/maven2/com/johnsnowlabs/nlp/spark-nlp_2.11/1.7.2/spark-nlp_2.11-1.7.2.jar)
 Spark-NLP-OCR Module (Requires native Tesseract 4.x+ for image based OCR. Does not require Spark-NLP to work but highly suggested)
-[Spark-NLP-OCR 1.7.1 FAT-JAR](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-ocr-assembly-1.7.1.jar)
+[Spark-NLP-OCR 1.7.2 FAT-JAR](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-ocr-assembly-1.7.2.jar)
 
 ## Maven central
 
@@ -100,19 +100,19 @@ Our package is deployed to maven central. In order to add this package as a depe
 <dependency>
   <groupId>com.johnsnowlabs.nlp</groupId>
   <artifactId>spark-nlp_2.11</artifactId>
-  <version>1.7.1</version>
+  <version>1.7.2</version>
 </dependency>
 ```
 
 #### SBT
 ```sbtshell
-libraryDependencies += "com.johnsnowlabs.nlp" % "spark-nlp_2.11" % "1.7.1"
+libraryDependencies += "com.johnsnowlabs.nlp" % "spark-nlp_2.11" % "1.7.2"
 ```
 
 If you are using `scala 2.11`
 
 ```sbtshell
-libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "1.7.1"
+libraryDependencies += "com.johnsnowlabs.nlp" %% "spark-nlp" % "1.7.2"
 ```
 
 ## Using the jar manually 
@@ -133,7 +133,7 @@ The preferred way to use the library when running spark programs is using the `-
 
 If you have troubles using pretrained() models in your environment, here a list to various models (only valid for latest versions).
 If there is any older than current version of a model, it means they still work for current versions.
-### Updated for 1.7.1
+### Updated for 1.7.2
 ### Pipelines
 * [Basic Pipeline](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pipeline_basic_en_1.6.1_2_1533856444797.zip)
 * [Advanced Pipeline](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pipeline_advanced_en_1.7.0_2_1539460910585.zip)
diff --git a/build.sbt b/build.sbt
@@ -9,7 +9,7 @@ name := "spark-nlp"
 
 organization := "com.johnsnowlabs.nlp"
 
-version := "1.7.1"
+version := "1.7.2"
 
 scalaVersion in ThisBuild := scalaVer
 
@@ -138,7 +138,7 @@ assemblyMergeStrategy in assembly := {
 lazy val ocr = (project in file("ocr"))
   .settings(
     name := "spark-nlp-ocr",
-    version := "1.7.1",
+    version := "1.7.2",
     libraryDependencies ++= ocrDependencies ++
       analyticsDependencies ++
       testDependencies,
diff --git a/docs/index.html b/docs/index.html
@@ -78,7 +78,7 @@ <h2 class="title">High Performance NLP with Apache Spark </h2>
                     </p>
                 <a class="btn btn-info btn-cta" style="float: center;margin-top: 10px;" href="mailto:nlp@johnsnowlabs.com?subject=SparkNLP%20Slack%20access" target="_blank"> Questions? Join our Slack</a>
                 <b/><p/><p/>
-                <p><span class="label label-warning">2018 Oct 19th - Update!</span> 1.7.1 Released! Word embeddings decoupled from annotators and better Windows support</p>
+                <p><span class="label label-warning">2018 Oct 19th - Update!</span> 1.7.2 Released! Word embeddings decoupled from annotators and better Windows support</p>
             </div>
             <div id="cards-wrapper" class="cards-wrapper row">
                 <div class="item item-green col-md-4 col-sm-6 col-xs-6">
diff --git a/docs/quickstart.html b/docs/quickstart.html
@@ -95,35 +95,35 @@ <h2 class="section-title">Requirements & Setup</h2>
                                 To start using the library, execute any of the following lines
                                 depending on your desired use case:
                                 </p>
-                                <pre><code class="language-javascript">spark-shell --packages JohnSnowLabs:spark-nlp:1.7.1
-pyspark --packages JohnSnowLabs:spark-nlp:1.7.1
-spark-submit --packages JohnSnowLabs:spark-nlp:1.7.1
+                                <pre><code class="language-javascript">spark-shell --packages JohnSnowLabs:spark-nlp:1.7.2
+pyspark --packages JohnSnowLabs:spark-nlp:1.7.2
+spark-submit --packages JohnSnowLabs:spark-nlp:1.7.2
 </code></pre>
                                 <div><b>NOTE: </b>Spark packages --packages has been reported to work unproperly, particularly in python, when utilizing physical clusters.
                                     Utilizing --jars is advised. For python, add python Spark-NLP through pip</div>
                                 <p/>
                                 <h3><b>Databricks cloud cluster</b> & <b>Apache Zeppelin</b></h3>
-                                <pre><code class="language-javascript">com.johnsnowlabs.nlp:spark-nlp_2.11:1.7.1</code></pre>
+                                <pre><code class="language-javascript">com.johnsnowlabs.nlp:spark-nlp_2.11:1.7.2</code></pre>
                                 <p>
                                     For Python in <b>Apache Zeppelin</b> you may need to setup <i><b>SPARK_SUBMIT_OPTIONS</b></i> utilizing --packages instruction shown above like this
                                 </p>
-                                <pre><code class="language-javascript">export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:1.7.1"</code></pre>
+                                <pre><code class="language-javascript">export SPARK_SUBMIT_OPTIONS="--packages JohnSnowLabs:spark-nlp:1.7.2"</code></pre>
                                 <h3><b>Python Jupyter Notebook with PySpark</b></h3>
                                 <pre><code class="language-javascript">export SPARK_HOME=/path/to/your/spark/folder
 export PYSPARK_DRIVER_PYTHON=jupyter
 export PYSPARK_DRIVER_PYTHON_OPTS=notebook
 
-pyspark --packages JohnSnowLabs:spark-nlp:1.7.1</code></pre>
+pyspark --packages JohnSnowLabs:spark-nlp:1.7.2</code></pre>
                                 <h3><b>Python without explicit Spark Installation</b></h3>
                                 <p>Use pip to install (after you pip installed pyspark)</p>
-                                <pre><code class="language-javascript">pip install spark-nlp==1.7.1</code></pre>
+                                <pre><code class="language-javascript">pip install spark-nlp==1.7.2</code></pre>
                                 <p>In this way, you will have to start SparkSession in your python program manually, this is an example</p>
                                 <pre><code class="python">spark = SparkSession.builder \
     .appName("ner")\
     .master("local[*]")\
     .config("spark.driver.memory","4G")\
     .config("spark.driver.maxResultSize", "2G") \
-    .config("spark.driver.extraClassPath", "lib/spark-nlp-assembly-1.7.1.jar")\
+    .config("spark.driver.extraClassPath", "lib/spark-nlp-assembly-1.7.2.jar")\
     .config("spark.kryoserializer.buffer.max", "500m")\
     .getOrCreate()</code></pre>
                                 <h3>S3 based standalone cluster (No Hadoop)</h3>
@@ -145,11 +145,11 @@ <h3>S3 based standalone cluster (No Hadoop)</h3>
                                 <h3>Pre-Compiled Spark-NLP for download</h3>
                                 <p>
                                     Pre-compiled Spark-NLP assembly fat-jar for using in standalone projects, may be downloaded
-                                    <a href="https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-assembly-1.7.1.jar">here</a>
+                                    <a href="https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-assembly-1.7.2.jar">here</a>
                                     Non-fat-jar may be downloaded
-                                    <a href="http://repo1.maven.org/maven2/com/johnsnowlabs/nlp/spark-nlp_2.11/1.7.1/spark-nlp_2.11-1.7.1.jar">here</a>
+                                    <a href="http://repo1.maven.org/maven2/com/johnsnowlabs/nlp/spark-nlp_2.11/1.7.2/spark-nlp_2.11-1.7.2.jar">here</a>
                                     then, run spark-shell or spark-submit with appropriate <b>--jars
-                                    /path/to/spark-nlp_2.11-1.7.1.jar</b> to use the library in spark.
+                                    /path/to/spark-nlp_2.11-1.7.2.jar</b> to use the library in spark.
                                 </p>
                                 <p>
                                     For further alternatives and documentation check out our README page in <a href="https://github.com/JohnSnowLabs/spark-nlp">GitHub</a>.
@@ -435,7 +435,7 @@ <h2 class="section-title">Utilizing Spark-NLP OCR PDF Converter</h2>
                                 <h3 class="block-title">Installing Spark-NLP OCRHelper</h3>
                                 <p>
                                     First, either build from source or download the following standalone jar module (works both from Spark-NLP python and scala):
-                                    <a href="https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-ocr-assembly-1.7.1.jar">Spark-NLP-OCR</a>
+                                    <a href="https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/spark-nlp-ocr-assembly-1.7.2.jar">Spark-NLP-OCR</a>
                                     And add it to your Spark environment (with --jars or spark.driver.extraClassPath and spark.executor.extraClassPath configuration)
                                     Second, if your PDFs don't have a text layer (this depends on how PDFs were created), the library will use Tesseract 4.0 on background.
                                     Tesseract will utilize native libraries, so you'll have to get them installed in your system.
diff --git a/src/main/scala/com/johnsnowlabs/util/Build.scala b/src/main/scala/com/johnsnowlabs/util/Build.scala
@@ -11,6 +11,6 @@ object Build {
     if (version != null && version.nonEmpty)
       version
     else
-      "1.7.0"
+      "1.7.2"
   }
 }

Original file line number	Diff line number	Diff line change
`@@ -11,6 +11,6 @@ object Build {`
`11`	`11`	`if (version != null && version.nonEmpty)`
`12`	`12`	`version`
`13`	`13`	`else`
`14`		`- "1.7.0"`
	`14`	`+ "1.7.2"`
`15`	`15`	`}`
`16`	`16`	`}`