Skip to content

Release the network resource (e.g, -p) during checkpoint #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 20, 2015
Merged

Release the network resource (e.g, -p) during checkpoint #3

merged 3 commits into from
May 20, 2015

Conversation

huikang
Copy link

@huikang huikang commented May 19, 2015

Restore failed if network resource not released during checkpoint,
e.g., a container with port open with -p

Signed-off-by: Hui Kang [email protected]

Restore failed if network resource not released during checkpoint,
e.g., a container with port open with -p

Signed-off-by: Hui Kang <[email protected]>
@huikang
Copy link
Author

huikang commented May 19, 2015

@boucher this PR is to fix the error when restore a checkpointed container with -p.
Otherwise, it will report error for restoring such container. Thanks.

  • Hui

@boucher
Copy link
Owner

boucher commented May 19, 2015

So, we shouldn't release the network if LeaveRunning is true, right? We need to just release the network when the process exits.

@huikang
Copy link
Author

huikang commented May 19, 2015

@boucher do you mean that if LeaveRunning is false for checkpoint, the network resource should be released. Right? If so, I will add a condition check and re-submit.

@boucher
Copy link
Owner

boucher commented May 19, 2015

Yes, if LeaveRunning is false, it should be released.

On Tue, May 19, 2015 at 2:29 PM, huikang [email protected] wrote:

@boucher https://github.com/boucher do you mean that if LeaveRunning is
false for checkpoint, the network resource should be released. Right? If
so, I will add a condition check and re-submit.


Reply to this email directly or view it on GitHub
#3 (comment).

@huikang
Copy link
Author

huikang commented May 19, 2015

ok, I will re-submit a PR. Thanks.

@huikang
Copy link
Author

huikang commented May 19, 2015

@boucher could you review the updated PR? Thanks.

@boucher
Copy link
Owner

boucher commented May 19, 2015

Should the network be released after the checkpoint has succeeded, rather than before its attempted?

@huikang
Copy link
Author

huikang commented May 20, 2015

@boucher updated. Thanks.

boucher added a commit that referenced this pull request May 20, 2015
Release the network resource (e.g, -p) during checkpoint
@boucher boucher merged commit 29765ad into boucher:force-restore May 20, 2015
boucher pushed a commit that referenced this pull request Aug 5, 2015
TL;DR: check for IsExist(err) after a failed MkdirAll() is both
redundant and wrong -- so two reasons to remove it.

Quoting MkdirAll documentation:

> MkdirAll creates a directory named path, along with any necessary
> parents, and returns nil, or else returns an error. If path
> is already a directory, MkdirAll does nothing and returns nil.

This means two things:

1. If a directory to be created already exists, no error is returned.

2. If the error returned is IsExist (EEXIST), it means there exists
a non-directory with the same name as MkdirAll need to use for
directory. Example: we want to MkdirAll("a/b"), but file "a"
(or "a/b") already exists, so MkdirAll fails.

The above is a theory, based on quoted documentation and my UNIX
knowledge.

3. In practice, though, current MkdirAll implementation [1] returns
ENOTDIR in most of cases described in #2, with the exception when
there is a race between MkdirAll and someone else creating the
last component of MkdirAll argument as a file. In this very case
MkdirAll() will indeed return EEXIST.

Because of #1, IsExist check after MkdirAll is not needed.

Because of #2 and #3, ignoring IsExist error is just plain wrong,
as directory we require is not created. It's cleaner to report
the error now.

Note this error is all over the tree, I guess due to copy-paste,
or trying to follow the same usage pattern as for Mkdir(),
or some not quite correct examples on the Internet.

[v2: a separate aufs commit is merged into this one]

[1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go

Signed-off-by: Kir Kolyshkin <[email protected]>
boucher pushed a commit that referenced this pull request Oct 1, 2015
TL;DR: stop building static binary that may fail

Linker flag --unresolved-symbols=ignore-in-shared-libs was added
in commit 06d0843 two years ago for the static build case, presumably
to avoid dealing with problem of missing libraries.

For the record, this is what ld(1) man page says:

> --unresolved-symbols=method
>    Determine how to handle unresolved symbols.  There are four
>    possible values for method:
> .........
>    ignore-in-shared-libs
>        Report unresolved symbols that come from regular object files,
>        but ignore them if they come from shared libraries.  This can
>        be useful when creating a dynamic binary and it is known that
>        all the shared libraries that it should be referencing are
>        included on the linker's command line.

Here, the flag is not used for its purpose ("creating a dynamic binary")
and does more harm than good. Instead of complaining about missing symbols
as it should do if some libraries are missing from LIBS/LDFLAGS, it lets
ld create a binary with unresolved symbols, ike this:

 $ readelf -s bundles/1.7.1/binary/docker-1.7.1 | grep -w UND
 ........
 21029: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND dlopen
 .........

Such binary is working just fine -- until code calls one of those
functions, then it crashes (for apparently no reason, i.e. it is
impossible to tell why from the diagnistics printed).

In other words, adding this flag allows to build a static binary
with missing libraries, hiding the problem from both a developer
(who forgot to add a library to #cgo: LDFLAGS -- I was one such
developer a few days ago when I was working on ploop graphdriver)
and from a user (who expects the binary to work without crashing,
and it does that until the code calls a function in one of those
libraries).

Removing the flag immediately unveils the problem (as it should):

	/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libsqlite3.a(sqlite3.o):
	In function `unixDlError':
	(.text+0x20971): undefined reference to `dlerror'
	/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libsqlite3.a(sqlite3.o):
	In function `unixDlClose':
	(.text+0x8814): undefined reference to `dlclose'

The problem is, gosqlite package says:

	#cgo LDFLAGS: -lsqlite3

which is enough for dynamic linking, as indirect dependencies (i.e.
libraries required by libsqlite3.so) are listed in .so file and will be
resolved dynamically by ldd upon executing the binary.

For static linking though, one has to list all the required libraries,
both direct and indirect. For libraries with pkgconfig support the
list of required libraries can be obtained with pkg-config:

	$ pkg-config --libs sqlite3 # dynamic linking case
	-lsqlite3
	$ pkg-config --libs --static sqlite3 # static case
	-lsqlite3 -ldl -lpthread

It seems that all one has to do is to fix gosqlite this way:

	-#cgo LDFLAGS: -lsqlite3
	+#cgo pkg-config: sqlite3

Unfortunately, cmd/go doesn't know that it needs to pass --static
flag to pkg-config in case of static linking
(see golang/go#12058).

So, for one, one has to do one of these things:

1. Patch sqlite.go like this:

	-#cgo LDFLAGS: -lsqlite3
	+#cgo pkg-config: --static sqlite3

(this is exactly what I do in goploop, see
kolyshkin/goploop@e9aa072f51)

2. Patch sqlite.go like this:
	-#cgo LDFLAGS: -lsqlite3
	+#cgo LDFLAGS: -lsqlite3 -ldl -lpthread

(I would submit this patch to gosqlite but it seems that
https://code.google.com/p/gosqlite/ is deserted and not maintained,
and patching it here is not right as it is "vendored")

3. Explicitly add -ldl for the static link case.
This is what this patch does.

4. Fork sqlite to github and maintain it there. Personally I am not
ready for that, as I'm neither a Go expert nor gosqlite user.

Now, #3 doesn't look like a clear solution, but nevertheless it makes
the build much better than it was before.

Signed-off-by: Kir Kolyshkin <[email protected]>
boucher pushed a commit that referenced this pull request Dec 4, 2017
This subtle bug keeps lurking in because error checking for `Mkdir()`
and `MkdirAll()` is slightly different wrt to `EEXIST`/`IsExist`:

 - for `Mkdir()`, `IsExist` error should (usually) be ignored
   (unless you want to make sure directory was not there before)
   as it means "the destination directory was already there"

 - for `MkdirAll()`, `IsExist` error should NEVER be ignored.

Mostly, this commit just removes ignoring the IsExist error, as it
should not be ignored.

Also, there are a couple of cases then IsExist is handled as
"directory already exist" which is wrong. As a result, some code
that never worked as intended is now removed.

NOTE that `idtools.MkdirAndChown()` behaves like `os.MkdirAll()`
rather than `os.Mkdir()` -- so its description is amended accordingly,
and its usage is handled as such (i.e. IsExist error is not ignored).

For more details, a quote from my runc commit 6f82d4b (July 2015):

    TL;DR: check for IsExist(err) after a failed MkdirAll() is both
    redundant and wrong -- so two reasons to remove it.

    Quoting MkdirAll documentation:

    > MkdirAll creates a directory named path, along with any necessary
    > parents, and returns nil, or else returns an error. If path
    > is already a directory, MkdirAll does nothing and returns nil.

    This means two things:

    1. If a directory to be created already exists, no error is
    returned.

    2. If the error returned is IsExist (EEXIST), it means there exists
    a non-directory with the same name as MkdirAll need to use for
    directory. Example: we want to MkdirAll("a/b"), but file "a"
    (or "a/b") already exists, so MkdirAll fails.

    The above is a theory, based on quoted documentation and my UNIX
    knowledge.

    3. In practice, though, current MkdirAll implementation [1] returns
    ENOTDIR in most of cases described in #2, with the exception when
    there is a race between MkdirAll and someone else creating the
    last component of MkdirAll argument as a file. In this very case
    MkdirAll() will indeed return EEXIST.

    Because of #1, IsExist check after MkdirAll is not needed.

    Because of #2 and #3, ignoring IsExist error is just plain wrong,
    as directory we require is not created. It's cleaner to report
    the error now.

    Note this error is all over the tree, I guess due to copy-paste,
    or trying to follow the same usage pattern as for Mkdir(),
    or some not quite correct examples on the Internet.

    [1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go

Signed-off-by: Kir Kolyshkin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants