-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Ignoring package_data when include_package_data is True #1461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I tend to agree that the intuitive behavior would either be:
My preference would be 1, but that is a backwards-incompatible change that could probably cause all kinds of issues with existing packages, and a shocking number of projects don't test their packages as installed. I think we can detect if both are specified and switch it over to a warning for a few releases, then an error. Also, I find |
As a newcomer to the Python packaging world I experienced that confusion first hand |
This is completely confusing and frustrating. Hope to see this resolved very soon |
When `include_package_data=True`, only files listed in MANIFEST.in are included and the `package_data` option is ignored. > If using the setuptools-specific include_package_data argument, files > specified by package_data will not be automatically added to the manifest > unless they are listed in the MANIFEST.in file. see: - https://setuptools.readthedocs.io/en/latest/setuptools.html#including-data-files - pypa/setuptools#1461 fixes tortoise#439
When `include_package_data=True`, only files listed in MANIFEST.in are included and the `package_data` option is ignored. > If using the setuptools-specific include_package_data argument, files > specified by package_data will not be automatically added to the manifest > unless they are listed in the MANIFEST.in file. see: - https://setuptools.readthedocs.io/en/latest/setuptools.html#including-data-files - pypa/setuptools#1461 fixes #439
I suspect this issue is related to this hacky behavior. |
In jupyter/jupyter-packaging#66, jupyter_packaging switched (back) to using setuptools rather than distutils for sdist. We had problems before with the staging directory not being included when using the setuptools sdist (see conda-forge/jupyterlab-feedstock#235 and jupyter/jupyter-packaging#51). It seems that the real problem we were experiencing was that when you have include_package_data=True, setuptools apparently does not include package_data files that are not given in MANIFEST.in (see pypa/setuptools#1461, for example). This tries to get our packaging working with setuptools. We’ll be able to test this better when we actually do a release. I tested with just running `python setup.py sdist` in a fresh checkout, and it ended up building the javascript in the staging directory, and then it included the build and node_modules directory. Thus I prune these out explicitly here. However, I think our normal build process does not build in that directory, but rather copies over files from the dev_mode directory, so the prune may not be needed in our normal build process. It doesn’t hurt to have it in, though. It may be that we can consolidate these include statements, for example, perhaps we can just do “graft jupyterlab/staging”. However, I went for simplicity in closely mirroring what we have in package_data in setup.py.
See pypa/setuptools#1461 for why this is necessary.
Hi @Tobotimus. I just would like to clarify, that According to some experiments, this is the expected behaviour:
Please notice this considers the extreme case, when the data files are placed inside directories that are not valid Python packages (e.g. missing Also have in mind that "data files" outside the package directory are no longer recommended *2. Finally, I might have done something wrong in the experiments, so any review/correction is appreciated. |
The inconsistency for the `package_data` configuration in sdists when `include_package_data=True` in pypa#1461 have been causing some problems for the community for a while, as also shown in pypa#2835. As pointed out by [@jaraco](pypa#1461 (comment)), this was being caused by a mechanism to break the recursion between the `egg_info` and `sdist` commands. In summary the loop is caused by the following behaviour: - the `egg_info` command uses a subclass of `sdist` (`manifest_maker`) to calculate the MANIFEST, - the `sdist` class needs to know the MANIFEST to calculate the data files when `include_package_data=True` Previously, the mechanism to break this loop was to simply ignore the data files in `sdist` when `include_package_data=True`. The approach implemented in this change was to replace this mechanism, by allowing `manifest_maker` to override the `_safe_data_files` method from `sdist`. --- Please notice [an extensive experiment] (https://github.com/abravalheri/experiment-setuptools-package-data) was carried out to investigate the previous confusing behaviour. There is also [a simplified theoretical analysis] (pyscaffold/pyscaffold#535 (comment)) comparing the observed behavior in the experiment and the expected one. This analysis point out to the same offender indicated by [@jaraco](pypa#1461 (comment)) (which is being replaced in this change).
The inconsistency for the `package_data` configuration in sdists when `include_package_data=True` in pypa#1461 have been causing some problems for the community for a while, as also shown in pypa#2835. As pointed out by [@jaraco](pypa#1461 (comment)), this was being caused by a mechanism to break the recursion between the `egg_info` and `sdist` commands. In summary the loop is caused by the following behaviour: - the `egg_info` command uses a subclass of `sdist` (`manifest_maker`) to calculate the MANIFEST, - the `sdist` class needs to know the MANIFEST to calculate the data files when `include_package_data=True` Previously, the mechanism to break this loop was to simply ignore the data files in `sdist` when `include_package_data=True`. The approach implemented in this change was to replace this mechanism, by allowing `manifest_maker` to override the `_safe_data_files` method from `sdist`. --- Please notice [an extensive experiment] (https://github.com/abravalheri/experiment-setuptools-package-data) was carried out to investigate the previous confusing behaviour. There is also [a simplified theoretical analysis] (pyscaffold/pyscaffold#535 (comment)) comparing the observed behavior in the experiment and the expected one. This analysis point out to the same offender indicated by [@jaraco](pypa#1461 (comment)) (which is being replaced in this change).
Has there been a fix for this? Im having issues where building an |
Hi @zoj613 can you share a link to your project (or a mock project in the case it is private, replicating its behaviour), so I can have a look? The issue has been fixed for a while. |
@abravalheri here is the project branch: https://github.com/zoj613/polyagamma/tree/remove_poetry This is an attempt to replace I'm really not sure how to move forward now. I've read the docs multiple times and followed the instructions but still not having any luck. The tests in this draft PR seem to pass from an editable installation but the distribution files (sdist and wheel) are completely wrong. |
Hi @zoj613, thank you very much for providing the link. I tried the following: --- /dev/null
+++ b/MANIFEST.in
@@ -0,0 +1,4 @@
+prune .github
+prune examples
+prune scripts
+prune tests
diff --git a/pyproject.toml b/pyproject.toml
index 27180ed..ac7bf8d 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -5,6 +5,7 @@ requires = [
"setuptools>=61.0.0",
"numpy==1.21.3; python_version=='3.10'",
"numpy==1.19.0; python_version<='3.9'",
+ "setuptools-scm",
]
build-backend = "setuptools.build_meta"
@@ -32,23 +33,12 @@ classifiers = [
]
[tool.setuptools]
-zip-safe = false
-include-package-data = false
packages = ["polyagamma"]
[tool.setuptools.dynamic]
version = {attr = "polyagamma.__version__"}
-[tools.setuptools.packages.find]
-exclude = [".github*", "examples*", "scripts*", "tests*"]
-include = ["include*"]
-
-[tool.setuptools.package-data]
-polyagamma = ["*.c", "*.pyi", "*.pxd"]
-"*" = ["include*"]
-
[tool.setuptools.exclude-package-data]
-"*" = [".github", "examples", "scripts", "tests"]
polyagamma = ["*.pyx"]
[tool.coverage.run]
As a result, I managed to obtain the following distributions: % unzip -l dist/*.whl
Archive: dist/polyagamma-1.3.3-cp38-cp38-linux_x86_64.whl
Length Date Time Name
--------- ---------- ----- ----
198 2022-06-20 11:06 polyagamma/__init__.pxd
120 2022-06-20 11:30 polyagamma/__init__.py
168 2022-06-20 11:06 polyagamma/__init__.pyi
1847392 2022-06-20 11:56 polyagamma/_polyagamma.cpython-38-x86_64-linux-gnu.so
859 2022-06-20 11:06 polyagamma/_polyagamma.pxd
3270 2022-06-20 11:06 polyagamma/_polyagamma.pyi
1525 2022-06-20 11:56 polyagamma-1.3.3.dist-info/LICENSE
13576 2022-06-20 11:56 polyagamma-1.3.3.dist-info/METADATA
103 2022-06-20 11:56 polyagamma-1.3.3.dist-info/WHEEL
11 2022-06-20 11:56 polyagamma-1.3.3.dist-info/top_level.txt
917 2022-06-20 11:56 polyagamma-1.3.3.dist-info/RECORD
--------- -------
1868139 11 files % tar tf dist/*.tar.gz
polyagamma-1.3.3/
polyagamma-1.3.3/.gitignore
polyagamma-1.3.3/LICENSE
polyagamma-1.3.3/MANIFEST.in
polyagamma-1.3.3/Makefile
polyagamma-1.3.3/PKG-INFO
polyagamma-1.3.3/README.md
polyagamma-1.3.3/include/
polyagamma-1.3.3/include/pgm_density.h
polyagamma-1.3.3/include/pgm_random.h
polyagamma-1.3.3/polyagamma/
polyagamma-1.3.3/polyagamma/__init__.pxd
polyagamma-1.3.3/polyagamma/__init__.py
polyagamma-1.3.3/polyagamma/__init__.pyi
polyagamma-1.3.3/polyagamma/_polyagamma.pxd
polyagamma-1.3.3/polyagamma/_polyagamma.pyi
polyagamma-1.3.3/polyagamma/_polyagamma.pyx
polyagamma-1.3.3/polyagamma.egg-info/
polyagamma-1.3.3/polyagamma.egg-info/PKG-INFO
polyagamma-1.3.3/polyagamma.egg-info/SOURCES.txt
polyagamma-1.3.3/polyagamma.egg-info/dependency_links.txt
polyagamma-1.3.3/polyagamma.egg-info/requires.txt
polyagamma-1.3.3/polyagamma.egg-info/top_level.txt
polyagamma-1.3.3/pyproject.toml
polyagamma-1.3.3/requirements-dev.txt
polyagamma-1.3.3/setup.cfg
polyagamma-1.3.3/setup.py
polyagamma-1.3.3/src/
polyagamma-1.3.3/src/pgm_alternate.c
polyagamma-1.3.3/src/pgm_alternate_trunc_points.h
polyagamma-1.3.3/src/pgm_common.c
polyagamma-1.3.3/src/pgm_common.h
polyagamma-1.3.3/src/pgm_density.c
polyagamma-1.3.3/src/pgm_devroye.c
polyagamma-1.3.3/src/pgm_macros.h
polyagamma-1.3.3/src/pgm_random.c
polyagamma-1.3.3/src/pgm_saddle.c Which seems to be more or less what you want, isn't it? ( You can use If you are still having problems with the build, please try removing the cache with: rm -rf build dist polyagamma.egg-info |
I think the main problem is that all sections with Files outside the package folder need to controlled via |
Hi @abravalheri , thanks a lot your proposed patch works and I can reproduce your results locally. The problem now is that the shared library cannot be used. It appears as though the import statement cannot find it: I get the following error after installing the package via the wheel and sdist:
The shared library is named I have two other tangential questions:
it did not seem to work? |
If I had to guess, I would say that this problem is very likely to be caused because you are trying to test a package that uses a "flat-layout" from the project root. Have you tried The flat-layout is very tricky to use and has several caveats, specially for testing. I always recommend people to use the
I don't think
I believe you can have |
@abravalheri you were right, It worked once I changed to another directory. Thank you so much for the help. I will look into automating versioning via |
Hi, firstly, thanks for the work you maintainers do on this library :)
I could be incorrect in saying this, but it's come to my understanding that setting
include_package_data=True
in setup includes data files specified by MANIFEST.in and only in MANIFEST.in. At the very least, this behaviour is not clearly explained in the docs, and at the worst it is unexpected and implicit behaviour.Here is a sentence from the docs which seems to be trying to say the above, although elsewhere in the docs
include_package_data
is not explained in this way:Doesn't this make
package_data
redundant wheninclude_package_data
is True? Does it really make sense to have a param calledinclude_package_data
which implicitly ignorespackage_data
?In summary, this is the behaviour I seem to have come across when using these options in various combinations:
package_data
withinclude_package_data=False
:package_data
are included as well as everything in MANIFEST.in.package_data
withinclude_package_data=True
:package_data
not specified by MANIFEST.in are excluded.package_data
withinclude_package_data=True
:I'm sorry if I've got this wrong, any correspondence is much appreciated :)
The text was updated successfully, but these errors were encountered: