Skip to content

spark 3.0.0 with hdp 3.2 #1498

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

spark 3.0.0 with hdp 3.2 #1498

wants to merge 1 commit into from

Conversation

avnerl
Copy link

@avnerl avnerl commented Jul 15, 2020

spark 3.0.0 with hadoop 3.2

test locally:

./gradlew clean
./gradlew updateSHAs build -Pdistro=hadoopYarn
./gradlew updateSHAs build -Pdistro=hadoopYarn3
./gradlew updateSHAs build -Pdistro=hadoopStable

@chengjuzhen
Copy link

Great works! It works well. 👍

@erpic
Copy link

erpic commented Sep 5, 2020

Many thanks, this works for me too.

I was able to build:
./spark/sql-30/build/libs/elasticsearch-spark-30_2.12-8.0.0-SNAPSHOT.jar

Which I could use to read/write an ElasticSearch 7.9.0 index as a dataframe using Pyspark 3.0.0.

I don't know anything about the java toolchain but here is what I had to do in order to build the jar I needed:

  • install java 11 (set JAVA_HOME) and java 8 (set JAVA8_HOME)
  • at the root of this project, run: ./gradlew -DskipTests=true build --info --stacktrace
    I got a coupe errors about a missing import "import org.elasticsearch.gradle.testclusters.RestTestRunnerTask". As this seemed to be just some unit test I manually edited the files to comment out the offending import and then I commented out all the functions that needed that so I could continue compiling. Same thing with something about ":qa:kerberos". I had to do that 3 or 4 time before I got to a JAR that worked for me

Many thanks again.

@jainshashank24
Copy link

Hi @avnerl

May i knw the API version used for ElasticSearch sink
Like whether it is DSv1 or DSv2 provided by the Spark itself

@axiangcoding
Copy link

@erpic can i get your jar? i can't build the source code due to the same exception you have

@scxwhite
Copy link

@erpic can i get your jar? i can't build the source code due to the same exception you have

I clone the project from @avnerl . and I fixed some compilation errors. You can clone the elasticsearch-hadoop-spark3.0 directly and execute the following command:

1.install java 11 (set JAVA_HOME) and java 8 (set JAVA8_HOME)
2.run ./gradlew -DskipTests=true build --info --stacktrace
the build jar path: ./spark/sql-30/build/libs/elasticsearch-spark-30_2.12-8.0.0-SNAPSHOT.jar

thanks for @avnerl , @erpic

@jimdowling
Copy link

Any updates on this?

@erpic
Copy link

erpic commented Jan 16, 2021

Pasting here:

  • some raw personal notes on the build process that worked for me
  • resulting jar (built using jdk11 on macos, I am really not a Java person, feel free to comment... note that in order to be attached here the jar had to be packaged in a zip archive, you need to uncompress that first)

Hope that helps. Looking forward to this being added to the official branch. Many thanks to all involved!

download source of proposed merge request from: https://github.com/avnerl/elasticsearch-hadoop
run: gradlew
requires java >= 11
use same java version as cluster, 11.0.8
root@master ~ # java -version
openjdk version "11.0.8" 2020-07-14
download jdk11 for macos from:
https://www.oracle.com/java/technologies/javase-jdk11-downloads.html
installed the dmg, now mac correctly says:
➜ ~ java -version
java version "11.0.8" 2020-07-14 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.8+10-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.8+10-LTS, mixed mode)
restarted command: gradlew
import error: import org.elasticsearch.gradle.testclusters.RestTestRunnerTask
commented out: "import org.elasticsearch.gradle.testclusters.RestTestRunnerTask" and body of functions: apply, createClusterFor in buildSrc/src/main/groovy/org/elasticsearch/hadoop/gradle/fixture/ElasticsearchFixturePlugin.groovy
also commented out: configureIntegrationTestTask in /Users/eric/Downloads/elasticsearch-hadoop-master-spark3/buildSrc/src/main/groovy/org/elasticsearch/hadoop/gradle/BuildPlugin.groovy
$JAVA8_HOME must be set to build ES-Hadoop
JAVA8_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home
comment out a couple more things, :qa:kerberos
./gradlew -DskipTests=true build --info --stacktrace
working!
result in :
./spark/sql-30/build/libs/elasticsearch-spark-30_2.12-8.0.0-SNAPSHOT.jar (NOT: ./build/libs/elasticsearch-hadoop-8.0.0-SNAPSHOT.jar)

elasticsearch-spark-30_2.12-8.0.0-CUSTOMBUILD.jar.zip

@jbaiera
Copy link
Member

jbaiera commented Jan 29, 2021

Closing this in favor of #1592 - Thanks for the effort put into testing these changes, but the new PR accounts for all the whacky build changes that went into the project over the last year to better support these upgrades going forward. An additional thanks to everyone's patience on the calls for these version updates.

@jbaiera jbaiera closed this Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants