Release the network resource (e.g, -p) during checkpoint #3

huikang · 2015-05-19T21:13:04Z

Restore failed if network resource not released during checkpoint,
e.g., a container with port open with -p

Signed-off-by: Hui Kang [email protected]

Restore failed if network resource not released during checkpoint, e.g., a container with port open with -p Signed-off-by: Hui Kang <[email protected]>

huikang · 2015-05-19T21:15:52Z

@boucher this PR is to fix the error when restore a checkpointed container with -p.
Otherwise, it will report error for restoring such container. Thanks.

Hui

boucher · 2015-05-19T21:22:21Z

So, we shouldn't release the network if LeaveRunning is true, right? We need to just release the network when the process exits.

huikang · 2015-05-19T21:29:06Z

@boucher do you mean that if LeaveRunning is false for checkpoint, the network resource should be released. Right? If so, I will add a condition check and re-submit.

boucher · 2015-05-19T21:30:31Z

Yes, if LeaveRunning is false, it should be released.

On Tue, May 19, 2015 at 2:29 PM, huikang [email protected] wrote:

@boucher https://github.com/boucher do you mean that if LeaveRunning is
false for checkpoint, the network resource should be released. Right? If
so, I will add a condition check and re-submit.

—
Reply to this email directly or view it on GitHub
#3 (comment).

huikang · 2015-05-19T21:31:56Z

ok, I will re-submit a PR. Thanks.

Signed-off-by: Hui Kang <[email protected]>

huikang · 2015-05-19T23:30:01Z

@boucher could you review the updated PR? Thanks.

boucher · 2015-05-19T23:43:04Z

Should the network be released after the checkpoint has succeeded, rather than before its attempted?

Signed-off-by: Hui Kang <[email protected]>

huikang · 2015-05-20T19:22:28Z

@boucher updated. Thanks.

Release the network resource (e.g, -p) during checkpoint

TL;DR: check for IsExist(err) after a failed MkdirAll() is both redundant and wrong -- so two reasons to remove it. Quoting MkdirAll documentation: > MkdirAll creates a directory named path, along with any necessary > parents, and returns nil, or else returns an error. If path > is already a directory, MkdirAll does nothing and returns nil. This means two things: 1. If a directory to be created already exists, no error is returned. 2. If the error returned is IsExist (EEXIST), it means there exists a non-directory with the same name as MkdirAll need to use for directory. Example: we want to MkdirAll("a/b"), but file "a" (or "a/b") already exists, so MkdirAll fails. The above is a theory, based on quoted documentation and my UNIX knowledge. 3. In practice, though, current MkdirAll implementation [1] returns ENOTDIR in most of cases described in #2, with the exception when there is a race between MkdirAll and someone else creating the last component of MkdirAll argument as a file. In this very case MkdirAll() will indeed return EEXIST. Because of #1, IsExist check after MkdirAll is not needed. Because of #2 and #3, ignoring IsExist error is just plain wrong, as directory we require is not created. It's cleaner to report the error now. Note this error is all over the tree, I guess due to copy-paste, or trying to follow the same usage pattern as for Mkdir(), or some not quite correct examples on the Internet. [v2: a separate aufs commit is merged into this one] [1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go Signed-off-by: Kir Kolyshkin <[email protected]>

TL;DR: stop building static binary that may fail Linker flag --unresolved-symbols=ignore-in-shared-libs was added in commit 06d0843 two years ago for the static build case, presumably to avoid dealing with problem of missing libraries. For the record, this is what ld(1) man page says: > --unresolved-symbols=method > Determine how to handle unresolved symbols. There are four > possible values for method: > ......... > ignore-in-shared-libs > Report unresolved symbols that come from regular object files, > but ignore them if they come from shared libraries. This can > be useful when creating a dynamic binary and it is known that > all the shared libraries that it should be referencing are > included on the linker's command line. Here, the flag is not used for its purpose ("creating a dynamic binary") and does more harm than good. Instead of complaining about missing symbols as it should do if some libraries are missing from LIBS/LDFLAGS, it lets ld create a binary with unresolved symbols, ike this: $ readelf -s bundles/1.7.1/binary/docker-1.7.1 | grep -w UND ........ 21029: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND dlopen ......... Such binary is working just fine -- until code calls one of those functions, then it crashes (for apparently no reason, i.e. it is impossible to tell why from the diagnistics printed). In other words, adding this flag allows to build a static binary with missing libraries, hiding the problem from both a developer (who forgot to add a library to #cgo: LDFLAGS -- I was one such developer a few days ago when I was working on ploop graphdriver) and from a user (who expects the binary to work without crashing, and it does that until the code calls a function in one of those libraries). Removing the flag immediately unveils the problem (as it should): /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libsqlite3.a(sqlite3.o): In function `unixDlError': (.text+0x20971): undefined reference to `dlerror' /usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/libsqlite3.a(sqlite3.o): In function `unixDlClose': (.text+0x8814): undefined reference to `dlclose' The problem is, gosqlite package says: #cgo LDFLAGS: -lsqlite3 which is enough for dynamic linking, as indirect dependencies (i.e. libraries required by libsqlite3.so) are listed in .so file and will be resolved dynamically by ldd upon executing the binary. For static linking though, one has to list all the required libraries, both direct and indirect. For libraries with pkgconfig support the list of required libraries can be obtained with pkg-config: $ pkg-config --libs sqlite3 # dynamic linking case -lsqlite3 $ pkg-config --libs --static sqlite3 # static case -lsqlite3 -ldl -lpthread It seems that all one has to do is to fix gosqlite this way: -#cgo LDFLAGS: -lsqlite3 +#cgo pkg-config: sqlite3 Unfortunately, cmd/go doesn't know that it needs to pass --static flag to pkg-config in case of static linking (see golang/go#12058). So, for one, one has to do one of these things: 1. Patch sqlite.go like this: -#cgo LDFLAGS: -lsqlite3 +#cgo pkg-config: --static sqlite3 (this is exactly what I do in goploop, see kolyshkin/goploop@e9aa072f51) 2. Patch sqlite.go like this: -#cgo LDFLAGS: -lsqlite3 +#cgo LDFLAGS: -lsqlite3 -ldl -lpthread (I would submit this patch to gosqlite but it seems that https://code.google.com/p/gosqlite/ is deserted and not maintained, and patching it here is not right as it is "vendored") 3. Explicitly add -ldl for the static link case. This is what this patch does. 4. Fork sqlite to github and maintain it there. Personally I am not ready for that, as I'm neither a Go expert nor gosqlite user. Now, #3 doesn't look like a clear solution, but nevertheless it makes the build much better than it was before. Signed-off-by: Kir Kolyshkin <[email protected]>

This subtle bug keeps lurking in because error checking for `Mkdir()` and `MkdirAll()` is slightly different wrt to `EEXIST`/`IsExist`: - for `Mkdir()`, `IsExist` error should (usually) be ignored (unless you want to make sure directory was not there before) as it means "the destination directory was already there" - for `MkdirAll()`, `IsExist` error should NEVER be ignored. Mostly, this commit just removes ignoring the IsExist error, as it should not be ignored. Also, there are a couple of cases then IsExist is handled as "directory already exist" which is wrong. As a result, some code that never worked as intended is now removed. NOTE that `idtools.MkdirAndChown()` behaves like `os.MkdirAll()` rather than `os.Mkdir()` -- so its description is amended accordingly, and its usage is handled as such (i.e. IsExist error is not ignored). For more details, a quote from my runc commit 6f82d4b (July 2015): TL;DR: check for IsExist(err) after a failed MkdirAll() is both redundant and wrong -- so two reasons to remove it. Quoting MkdirAll documentation: > MkdirAll creates a directory named path, along with any necessary > parents, and returns nil, or else returns an error. If path > is already a directory, MkdirAll does nothing and returns nil. This means two things: 1. If a directory to be created already exists, no error is returned. 2. If the error returned is IsExist (EEXIST), it means there exists a non-directory with the same name as MkdirAll need to use for directory. Example: we want to MkdirAll("a/b"), but file "a" (or "a/b") already exists, so MkdirAll fails. The above is a theory, based on quoted documentation and my UNIX knowledge. 3. In practice, though, current MkdirAll implementation [1] returns ENOTDIR in most of cases described in #2, with the exception when there is a race between MkdirAll and someone else creating the last component of MkdirAll argument as a file. In this very case MkdirAll() will indeed return EEXIST. Because of #1, IsExist check after MkdirAll is not needed. Because of #2 and #3, ignoring IsExist error is just plain wrong, as directory we require is not created. It's cleaner to report the error now. Note this error is all over the tree, I guess due to copy-paste, or trying to follow the same usage pattern as for Mkdir(), or some not quite correct examples on the Internet. [1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go Signed-off-by: Kir Kolyshkin <[email protected]>

Release the network resource during checkpoint

7d34de0

Restore failed if network resource not released during checkpoint, e.g., a container with port open with -p Signed-off-by: Hui Kang <[email protected]>

Release network if leaving-running is false for criu

45e7b61

Signed-off-by: Hui Kang <[email protected]>

Release network after checkpoint

203ae83

Signed-off-by: Hui Kang <[email protected]>

boucher added a commit that referenced this pull request May 20, 2015

Merge pull request #3 from huikang/force-restore-hkang

29765ad

Release the network resource (e.g, -p) during checkpoint

boucher merged commit 29765ad into boucher:force-restore May 20, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release the network resource (e.g, -p) during checkpoint #3

Release the network resource (e.g, -p) during checkpoint #3

Uh oh!

huikang commented May 19, 2015

Uh oh!

huikang commented May 19, 2015

Uh oh!

boucher commented May 19, 2015

Uh oh!

huikang commented May 19, 2015

Uh oh!

boucher commented May 19, 2015

Uh oh!

huikang commented May 19, 2015

Uh oh!

huikang commented May 19, 2015

Uh oh!

boucher commented May 19, 2015

Uh oh!

huikang commented May 20, 2015

Uh oh!

Uh oh!

Release the network resource (e.g, -p) during checkpoint #3

Release the network resource (e.g, -p) during checkpoint #3

Uh oh!

Conversation

huikang commented May 19, 2015

Uh oh!

huikang commented May 19, 2015

Uh oh!

boucher commented May 19, 2015

Uh oh!

huikang commented May 19, 2015

Uh oh!

boucher commented May 19, 2015

Uh oh!

huikang commented May 19, 2015

Uh oh!

huikang commented May 19, 2015

Uh oh!

boucher commented May 19, 2015

Uh oh!

huikang commented May 20, 2015

Uh oh!

Uh oh!