Skip to content

Addresses #268 : local builds should respect .gitignore #269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

spMohanty
Copy link

@spMohanty spMohanty commented Mar 18, 2018

Fixes #268 (added by @betatim)

tar.add('.', 'src/', filter=_filter_tar)
_exclude_tar = None
if os.path.exists(".gitignore"):
gitignore_fh = open(".gitignore")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use a context manager here so that the file gets closed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typical docker builds use a separate .dockerignore and ignore .gitignore. Should we follow that pattern? I'd rather be as close to exact for what docker build would normally do as we can.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betatim : sure, I will push in another commit with a bit of refactoring, and also the tests.

@minrk : Thats more of a usability question. If repo2docker wants to be transparent to the user then, adding a separate .dockerignore might add to the confusion. You could add conditionals to check if both .dockerignore and .gitignore exist, and prioritize the .dockerignore, but if the repository just has a .gitignore then you use that instead.

@betatim
Copy link
Member

betatim commented Mar 19, 2018

Could you add a test to check that a file that should get added gets added and one that shouldn't shouldn't?

Do the others have any thoughts on adding a new dependency?

Otherwise LGTM.

tar.add('.', 'src/', filter=_filter_tar)
_exclude_tar = None
if os.path.exists(".gitignore"):
gitignore_fh = open(".gitignore")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typical docker builds use a separate .dockerignore and ignore .gitignore. Should we follow that pattern? I'd rather be as close to exact for what docker build would normally do as we can.

# same to the way `git` excludes files based on `.gitignore`.
#
# https://github.com/cpburnz/python-path-specification/issues/19
return ignorespec.match_file(filepath)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It concerns me a bit that this isn't exact. Is there any possibility that we could end up excluding files that shouldn't be excluded, or does the bug you link exclusively mean that we will include some files we maybe shouldn't?

Copy link
Author

@spMohanty spMohanty Mar 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@minrk : pathspecs was the only pure python implementation I could find. Also, I think it does implement the pathspecs mostly correctly, but the difference comes from how git is a bit more liberal. For example, git is okay with referencing D/* to refer to all the subdirectories of D, whereas pathspecs expect you to write just D/.

In the issue I reported, and in most of the use cases, this will only mean we will forget to exclude some files which were meant to be excluded.

The only use case when a file that is not meant to be excluded is excluded, is when .gitignore specifically ignores a particular file, but the user had force-pushed the file to the repository.

But in the both these use cases, the .gitignore should ideally be modified to include the exceptions. For example, when you force-push say a *.csv file which was ignored by .gitignore, then you are expected to also remember to add a !<path_to_file> to .gitignore.

@betatim
Copy link
Member

betatim commented Mar 19, 2018

Hoisting @minrk's inline comment to here:

Typical docker builds use a separate .dockerignore and ignore .gitignore. Should we follow that pattern? I'd rather be as close to exact for what docker build would normally do as we can.

I hadn't thought of this and it is a very good point. The pragmatist in me would like to merge the git and docker ignore files to make it "just work". But it probably brings more problems than it solves.

On a binderhub we wouldn't ever see this as files in .gitignore would not be in the repo. So that maybe points towards using .dockerignore as it allows people to exclude things later.

@spMohanty
Copy link
Author

spMohanty commented Mar 19, 2018

@betatim : Well true, focusing on the binderhub use case, the .dockerignore solution actually also solves the other problems (like force-pushed data which is ignored by .gitignore will not be excluded).

Going by the discussion here, it seems that while they considered pathspec at point, they moved away from it. But the good thing is, they do package some of the code internally, so dockerignore can be supported without adding an extra dependency, and will always have the same functionality as docker-py.

They seem to seem to handle it here, where the call the docker.utils.tar function here, which uses the create_archive function here.

The only major risk is that, their tar + create_archive wrapper does not use the filter argument, which seems to be used to deal with the caching issue mentioned here

@choldgraf
Copy link
Member

I'm +1 on respecting .dockerignore. repo2docker should try to follow as much pre-existing patterns as possible, and I think it's more intuitive to put pre-existing behavior of .dockerignore into r2d rather than building a different kind of behavior into .gitignore

@choldgraf
Copy link
Member

@spMohanty do you think you'll have time to re-tool this PR to use .dockerignore instead of .gitignore.

@spMohanty
Copy link
Author

@choldgraf : Sure. Happy to give it a go this weekend.

@betatim
Copy link
Member

betatim commented Feb 5, 2020

Closing this as it has been inactive for over a year. We can make a new PR if the need still exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

local builds should respect .gitignore ?
5 participants