Skip to content

Hl.export_elasticsearch conflict with scala2.12 #2749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
iris-garden opened this issue Apr 30, 2024 · 0 comments
Open

Hl.export_elasticsearch conflict with scala2.12 #2749

iris-garden opened this issue Apr 30, 2024 · 0 comments
Labels
discourse-old migrated from discuss.hail.is (last updated more than 31 days ago)

Comments

@iris-garden
Copy link
Owner

mhebrard said:

Hi.

I am installing Hail v0.2.60 on spark 3.0.0 / scala 2.12.10

sudo make install-on-cluster HAIL_COMPILE_NATIVES=1 SCALA_VERSION=2.12.10 SPARK_VERSION=3.0.0

I am able to load, process and write data without issues.

The problem rise when I wish to export my table on Elasticsearch. Here I got a error that seems like an incompatibility with scala 2.12.

# Load table
ht_res = hl.read_table('s3://[...].ht')

hl.export_elasticsearch(
    ht_res,
    "[es-URL]",
    [es-port],
    '[index]',
    'documents',
    100,
    config={
        'es.nodes.wan.only':'true',
        'es.batch.write.retry.wait':'60s',
        'es.batch.write.retry.count':'30'
    },
    verbose=True
)

Hail version: 0.2.60-de1845e1c2f6
Error summary: NoSuchMethodError: scala.Predef$.refArrayOps([Ljava/lang/Object;)Lscala/collection/mutable/ArrayOps;

mhebrard said:

In the mean time, I am able to install Hail v0.2.60 on spark 2.4.6 / scala 2.11.12

sudo make install-on-cluster HAIL_COMPILE_NATIVES=1 SCALA_VERSION=2.11.12 SPARK_VERSION=2.4.6

In this context hl.export_elasticsearch() code above run without issue

johnc1231 said:

You’re right, this is a mistake. We hard code the path to the elasticsearch dependency to use spark 2 and scala 2.11.

I made a github issue here: hail-is/hail#9767

and assigned it to myself. Will try to address in next few days.

johnc1231 said:

I’m actually not sure how to do this. We use the library here, which is explicitly for spark 2.x:

mvnrepository.com

I’ve done some quick googling and didn’t immediately find anyone doing this with Spark 3, I’ll have to keep looking.

nawatts said:

There’s an open issue for Spark 3 / Scala 2.12 support in the elasticsearch-hadoop connnector.

github.com/elastic/elasticsearch-hadoop

johnc1231 said:

Thanks nawatts. So I think the answer then is that there’s not currently a way to export to elasticsearch from Spark 3, and until there is Hail likely won’t support it.

nawatts said:

Also, https://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html lists (in the Apache Spark section) supported Spark / Scala versions and their corresponding ES-Hadoop artifact ID.

@iris-garden iris-garden added the discourse-old migrated from discuss.hail.is (last updated more than 31 days ago) label Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discourse-old migrated from discuss.hail.is (last updated more than 31 days ago)
Projects
None yet
Development

No branches or pull requests

1 participant