Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bitnami/spark] Including py4j lib in PYTHONPATH #79059

Open
ShivKJ opened this issue Mar 21, 2025 · 4 comments
Open

[bitnami/spark] Including py4j lib in PYTHONPATH #79059

ShivKJ opened this issue Mar 21, 2025 · 4 comments
Assignees
Labels

Comments

@ShivKJ
Copy link

ShivKJ commented Mar 21, 2025

Name and Version

docker.io/bitnami/spark:3.5.1-debian-12-r12

What is the problem this feature will solve?

It will enable user to run pyspark application in docker container.

What is the feature you are proposing to solve the problem?

py4j is not in the PAYTHONPATH, so one can not run/submit spark application for python application in the docker container. We should add py4j in PYTHONPATH.

What alternatives have you considered?

I am using PYTHONPATH=$(ZIPS=(/opt/bitnami/spark/python/lib/*.zip); IFS=:; echo "${ZIPS[*]}"):$PYTHONPATH, to update the PYTHONPATH.

@github-actions github-actions bot added the triage Triage is needed label Mar 21, 2025
@ShivKJ
Copy link
Author

ShivKJ commented Mar 21, 2025

Note that, the above is NOT a problem when we run the spark-submit command and is a problem when python command is used to execute the python script.

@javsalgar
Copy link
Contributor

Hi,

Thank you so much for reporting. Would you like to submit a PR updating this environment variable? It could be done in the dockerfile or in the entrypoint, depending on the complexity

@javsalgar javsalgar changed the title Including py4j lib in PYTHONPATH [bitnami/spark] Including py4j lib in PYTHONPATH Mar 24, 2025
@ShivKJ
Copy link
Author

ShivKJ commented Mar 24, 2025

@javsalgar Sure, I will create a PR. I think, it will be better to add py4j directly in the Dockerfile where the PYTHONPATH is updated, as it is a crucial library for enabling Python applications to communicate with Spark.

@ShivKJ
Copy link
Author

ShivKJ commented Mar 30, 2025

@javsalgar I have created the PR. Initially, I intended to add the changes to the Dockerfile, but it became complex. Therefore, I moved the changes to entrypoint.sh as you suggested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants