Skip to content

When loading the python core dll using LoadLibraryA/W the encodings module cannot load. #126905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
AraHaan opened this issue Nov 16, 2024 · 17 comments
Labels
OS-windows type-bug An unexpected behavior, bug, or error

Comments

@AraHaan
Copy link
Contributor

AraHaan commented Nov 16, 2024

Bug report

Bug description:

I am trying to embed python inside of an exe where I don't want it inside of the exe's IMPORT_ADDRESS_TABLE and expected that when loading the python core with LoadLibraryA/W from the Windows API to work out of box just like it does when it is in the IMPORT_ADDRESS_TABLE.

I suspect the problem might be because the IMPORT_ADDRESS_TABLE is eventually calling the undocumented and non-exportedLdrpHandleTlsData function and that the LoadLibraryA/W API's might not be calling it but I am not entirely sure.

I even went as far as implementing my own copy of LoadLibraryA/W that does call this function in a creative way and it still does not work and I am not sure why. I expected for the normal LoadLibraryA/W functions to work out of box though without manually implementing the logic behind those functions to test if this was the case.

Why do I want to manually load the python core dll instead of letting the IMPORT_ADDRESS_TABLE and Windows to load it for me?

This is because I am embedding python and I am doing it with the python core itself inside of it's win32 resource section and then writing it out to disk before attempting to LoadLibraryA/W it. This is so I can then distribute a single standalone exe file that extracts the python core, uses the zip file with the standard library, c extensions (memory loaded), and any site-packages inside of the zip file from within the win32 resource section as well for properly loading up the python interpreter.

All of the logic behind loading the zip file with all of it's contents within the embed exe's win32 resource section works flawlessly when the core dll is in the IMPORT_ADDRESS_TABLE though. Sadly I doubt manually loading the python core and then injecting the python core dll inside of the currently running process's IMPORT_ADDRESS_TABLE (to make it use the existing handle from manually loading said dll) via code would work though as I suspect the code behind the IAT parts of the exe in memory is made readonly by the OS.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Windows

@AraHaan AraHaan added the type-bug An unexpected behavior, bug, or error label Nov 16, 2024
@zooba
Copy link
Member

zooba commented Nov 18, 2024

What's more likely is that your directory structure doesn't match a default CPython layout and you aren't initializing correctly. The encodings module is usually the first one that relies on having correct search paths set.

Check out https://docs.python.org/3/c-api/init_config.html#init-config and consider initializing with explicit module search paths.

@AraHaan
Copy link
Contributor Author

AraHaan commented Nov 22, 2024

What's more likely is that your directory structure doesn't match a default CPython layout and you aren't initializing correctly. The encodings module is usually the first one that relies on having correct search paths set.

Check out https://docs.python.org/3/c-api/init_config.html#init-config and consider initializing with explicit module search paths.

static void my_Py_Initialize() {
  if (Py_IsInitialized() != 0) {
    /* bpo-33932: Calling Py_Initialize() twice does nothing. */
    return;
  }

  PyStatus status;
  PyConfig config;
  PyFile_SetOpenCodeHook((Py_OpenCodeHookFunction)OpenCodeHook, NULL);
  PyConfig_InitIsolatedConfig(&config);
  wchar_t path[MAX_PATH] = { L'\0' };
  GetCurrentDirectoryW(MAX_PATH, path);
  status = PyWideStringList_Append(&config.module_search_paths, path);
  memset(path, 0, MAX_PATH);
  GetModuleFileNameW(GetModuleHandle(NULL), path, MAX_PATH);
  status = PyWideStringList_Append(&config.module_search_paths, path);
  wcscat(path, L"/site-packages");
  status = PyWideStringList_Append(&config.module_search_paths, path);
  config.install_signal_handlers = 1;
  // config.site_import = 0;
  config.module_search_paths_set = 1;
  if (PyStatus_Exception(status)) {
      PyConfig_Clear(&config);
      Py_ExitStatusException(status);
  }

  status = Py_InitializeFromConfig(&config);
  PyConfig_Clear(&config);
  if (PyStatus_Exception(status)) {
    Py_ExitStatusException(status);
  }
}

I call this in my code with my own OpenCodeHook function and insert the current directory, the exe itself and a site-packages subfolder as well. The exe itself has the zip stlib in it's resource section so it should be locating the normal python scripts just fine and any pyd files from the current directory of the process/the configured site-packages folder in that function yet it still cannot load the encodings module for some unknown reason.

@zooba
Copy link
Member

zooba commented Nov 25, 2024

What does your OpenCodeHook function do? As I mentioned, encodings is the first module that gets loaded normally, so it's likely the first one that's going through your hook.

I'm pretty sure we only handle .zip files with ZipImporter. It'll be up to your hook function to handle anything more complex (and really, you should probably use an importer here instead, if only to avoid trying to write .pyc files to invalid locations all the time and wasting perf).

@AraHaan
Copy link
Contributor Author

AraHaan commented Nov 25, 2024

static PyObject *__cdecl OpenCodeHook(PyObject *path, void *userData) {
  const char *_path = PyUnicode_AsUTF8(path);
  if (strstr(_path, ".exe") != NULL) {
    DWORD size;
    LPVOID resource = InternalLoadResource(GetModuleHandle(NULL), MAKEINTRESOURCEA(LIBS_ZIP), &size);
    PyObject *resource_bytes = PyBytes_FromStringAndSize((const char*)resource, size);
    if (resource_bytes == NULL) {
      printf("%s\n", "Failed to convert the Win32 Resource Data to a 'bytes' object.");
      return NULL; // Error occurred
    }

    PyObject *BytesIO = _PyImport_GetModuleAttrString("_io", "BytesIO");
    if (BytesIO == NULL) {
      printf("%s\n", "Failed to get the 'BytesIO' class from module '_io'.");
      Py_DECREF(resource_bytes);
      return NULL; // Error occurred
    }

    // Create a BytesIO instance using the bytes object
    PyObject *args = PyTuple_New(1);
    Py_INCREF(resource_bytes);
    if (args == NULL || PyTuple_SetItem(args, 0, resource_bytes) < 0) {
      printf("%s\n", "Failed to create the args tuple for creating an 'BytesIO' instance.");
      Py_DECREF(BytesIO);
      Py_DECREF(args);
      Py_DECREF(resource_bytes);
      return NULL; // Error occurred
    }

    // PyObject *args = PyTuple_Pack(1, resource_bytes);
    // if (args == NULL) {
    //   printf("%s\n", "Failed to create the args tuple for creating an 'BytesIO' instance.");
    //   Py_DECREF(BytesIO);
    //   return NULL; // Error occurred
    // }

    PyObject *bytesio_instance = PyObject_CallObject(BytesIO, args);
    Py_DECREF(BytesIO);
    Py_DECREF(args);
    Py_DECREF(resource_bytes);
    if (bytesio_instance == NULL) {
      printf("%s\n", "Failed to create the 'BytesIO' instance.");
      return NULL; // Error occurred
    }

    return bytesio_instance; // This is the BytesIO instance
  }
  else if (PyOS_stricmp(_path, "") != 0) {  // ignore empty string.
    printf("%s\n", _path);
    // FILE *file = fopen(_path, "r");
    // fread
  }

  Py_RETURN_NONE;
}

I probably should finish implementing the else block on this, but have not gotten to that part yet.

@zooba
Copy link
Member

zooba commented Nov 26, 2024

My suspicion is that the default importer is doing some other checks first and deciding that the file does not exist. It only has to try an os.stat("<path to exe>/Lib/encoding.py") to find out that it doesn't really exist, and so it'll never try to call open_code on it.

For what you're trying to do, you really are going to need to create an importer. You can look at how I did my DLL packer if you're interested (it's a bit convoluted, but I'm sure you can find your way around), or else the importlib docs and specifically the importlib.abc section should give you the info you need to implement this.

@AraHaan
Copy link
Contributor Author

AraHaan commented Nov 26, 2024

My suspicion is that the default importer is doing some other checks first and deciding that the file does not exist. It only has to try an os.stat("<path to exe>/Lib/encoding.py") to find out that it doesn't really exist, and so it'll never try to call open_code on it.

For what you're trying to do, you really are going to need to create an importer. You can look at how I did my DLL packer if you're interested (it's a bit convoluted, but I'm sure you can find your way around), or else the importlib docs and specifically the importlib.abc section should give you the info you need to implement this.

Strange, it should have looked inside of the zip file that I return as a BytesIO instance in my hook and then use zipimport to check if the module exists inside of the returned zip contents (when it checks the module search path for the one that is obtained from GetModuleFileNameW). Somehow it is skipping it I guess?

@AraHaan
Copy link
Contributor Author

AraHaan commented Nov 30, 2024

So, I tested it with the normal python314.zip file with my embed code + commented out my own Py_Initialize and used the original Py_Initialize. I also did a dummy _pth file set to that zip file and appended all of my site-packages inside of that zip file as well along with adding that to the _pth file with python314.zip/site-packages followed by the . last in said _pth file.

The result is sadly the same Fatal Python error: Failed to import encodings module.

@AraHaan
Copy link
Contributor Author

AraHaan commented Nov 30, 2024

Turns out I managed to get more information my appending this to my debug console command args when running my application in debug (on Release config): 1> debug.txt 2>&1

Output in that file:

Fatal Python error: Failed to import encodings module
Python runtime state: core initialized
Traceback (most recent call last):
  File "<frozen zipimport>", line 805, in _get_module_code
  File "<frozen zipimport>", line 690, in _unmarshal_code
  File "<frozen importlib._bootstrap_external>", line 447, in _classify_pyc
ImportError: bad magic number in 'encodings': b'\x13\x0e\r\n'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<frozen zipimport>", line 137, in get_code
  File "<frozen zipimport>", line 819, in _get_module_code
zipimport.ZipImportError: module load failed: bad magic number in 'encodings': b'\x13\x0e\r\n'

For some reason the thing is assuming that the magic number that is freshly compiled when I run the steps to make an embed output zip file and then extracting the python314.zip within the output.zip file inside of the cpython build folder and then copy that (using python's shutil.copyfile) to my program's output folder results in this.

Edit:
I tried again with rmdir externals\cpython and then git clone https://github.com/python/cpython.git --branch main when inside of my project's externals folder, switched back to my project folder which and then call my build_cpython.bat that builds all the python things I need to build my project. After that it resulted in this when debugging:

Python 3.14.0a2+ (heads/main-dirty:49f15d8667e, Nov 29 2024, 21:28:03) [MSC v.1943 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "_pyrepl\__main__.py", line 6, in <module>
  File "_pyrepl\main.py", line 59, in interactive_console
  File "_pyrepl\simple_interact.py", line 151, in run_multiline_interactive_console
  File "_pyrepl\readline.py", line 389, in multiline_input
  File "_pyrepl\reader.py", line 795, in readline
  File "_pyrepl\historical_reader.py", line 302, in prepare
  File "_pyrepl\reader.py", line 635, in prepare
  File "_pyrepl\windows_console.py", line 317, in prepare
  File "_pyrepl\windows_console.py", line 361, in getheightwidth
OSError: [WinError 6] The handle is invalid.

The OSError might be because I am redirecting stdin and stdout to a text file.

static void my_Py_Initialize() {
  if (Py_IsInitialized() != 0) {
    /* bpo-33932: Calling Py_Initialize() twice does nothing. */
    return;
  }

  PyStatus status;
  PyConfig config;
  PyFile_SetOpenCodeHook((Py_OpenCodeHookFunction)OpenCodeHook, NULL);
  PyConfig_InitIsolatedConfig(&config);
  config.install_signal_handlers = 1;
  config.site_import = 0;
  config.user_site_directory = 0;
  config.module_search_paths_set = 1;
  config.write_bytecode = 0;
  printf("%s%i%s\n", "config.site_import is value: '", config.site_import, "'");
  printf("%s%i%s\n", "config.user_site_directory is value: '", config.user_site_directory, "'");
  PyModuleSearchPath_Append(status, config, applicationFileName)
  PyModuleSearchPath_Append(status, config, applicationSitePackages)
  PyModuleSearchPath_Append(status, config, applicationFileDirectory)
  status = Py_InitializeFromConfig(&config);
  printf("%s%i%s\n", "config.site_import is value: '", config.site_import, "'");
  printf("%s%i%s\n", "config.user_site_directory is value: '", config.user_site_directory, "'");
  PyConfig_Clear(&config);
  if (PyStatus_Exception(status)) {
    Py_ExitStatusException(status);
  }
}

With this however, despite setting BOTH config.site_import and config.user_site_directory to 0, it prints that the value is 0 both BEFORE and AFTER my call to Py_InitializeFromConfig but yet for some reason when I go to run the program I see that the user site-packages folder is wrongly APPENDED to the end of sys.path.

@zooba
Copy link
Member

zooba commented Dec 28, 2024

assuming that the magic number that is freshly compiled

At a guess, you probably updated the CPython repo without regenerating your ZIP file, and someone changed the bytecode version in the meantime. That's very likely to happen when you're pulling from main.

yet for some reason when I go to run the program I see that the user site-packages folder is wrongly APPENDED to the end of sys.path.

If you're going to insist on building from main, then all you can do is modify Modules\getpath.py to print debugging output (use the warn() function with f-strings) and recompile.

If you try instead with the last release of 3.13, we'll have a better idea of what it's supposed to be doing. Or even the last alpha of 3.14 where we know it passed all the tests. A random commit from main could have any bug in it, and might already be fixed.

@AraHaan
Copy link
Contributor Author

AraHaan commented Jan 20, 2025

I figured out the TLS problem when trying to LoadLibrary the python core dll after extracting it from the exe file's WIN32 Resource section.

Turns out the DllMain function in the Python Core DLL does not initialize the TLS slots (Thread Local Storage) directly at all and so it magically works only when the dll is present inside of the exe's IAT (IMPORT_ADDRESS_TABLE).

https://learn.microsoft.com/en-us/windows/win32/dlls/dllmain

The best option to fix this is to patch the DllMain to always initialize the proper TLS slots (using the TSS API's) for the python core to use in the spots that msdocs page documents.

@zooba
Copy link
Member

zooba commented Jan 20, 2025

I figured out the TLS problem ...

I wasn't aware there was a TLS problem?

We allocate TLS slots when requested through our PyThread APIs. It shouldn't need to be done at DLL load time, and in fact we probably can't, because we haven't allocated anywhere to store the index at that point.

So your proposed fix would be a significant refactor to significant parts of Python's deepest infrastructure impacting all platforms. Before we go looking into that, perhaps you could describe the problem you were seeing?

@AraHaan
Copy link
Contributor Author

AraHaan commented Jan 21, 2025

I figured out the TLS problem ...

I wasn't aware there was a TLS problem?

We allocate TLS slots when requested through our PyThread APIs. It shouldn't need to be done at DLL load time, and in fact we probably can't, because we haven't allocated anywhere to store the index at that point.

So your proposed fix would be a significant refactor to significant parts of Python's deepest infrastructure impacting all platforms. Before we go looking into that, perhaps you could describe the problem you were seeing?

Basically, in my exe I have it first check if the python core dll exists in it's folder at runtime, following a version check prior to loading it and then if one or both of those checks fail I write the dll file from it's win32 resource section to that folder before calling LoadLibrary on it. This issue with this is that according to the Microsoft docs: If a DLL uses "__declspec(thread)" and might be loaded using LoadLibrary it must properly allocate and initialize it's TLS data inside of it's DLLMain. which is what I have read on one of it's doc pages.

The issue with that is that since Python 3.12 where __declspec(thread) started getting used is that it broke LoadLibrary for loading the dll.

Normally people have the Python Core DLL it their IAT (Import Address Table) unless they want to package their embed application as a single file with the python core, it's C extensions, and the zip of the stlib all inside of the win32 resource section of their exe and then later have them written to disk at runtime. During the normal usage (without LoadLibrary) ntdll.dll!LdprHandleTLSData (private unexported function) would normally handle allocating and initializing TLS via __declspec(thread) when the DLL is within the exe/dll's IAT at load-time).

The main problem:

The problem starts when needing to use LoadLibrary to load the python core instead of having it as the IAT (DelayLoading the python core also suffers from this problem as it uses LoadLibrary instead of having it as part of the IAT) which does not call into this private unexported function from ntdll.dll on Windows. Due to that loading the python core via LoadLibrary basically loads a broken python core at runtime with non-allocated and non-initialized TLS which results in being unable to load up the default C extensions which results in being unable to load the encodings module this way, even with proper setup in such an exe that embeds python. Immediately disabling DelayLoad and using the python core directly from it's IAT immediately fixes the problem even though is not ideal for those who want to publish their code as a single file exe which uses python at it's core.

Even py2exe suffers from this same problem and it implements it's own LoadLibrary function (using MemoryModule) which also is unable to both alloc and initialize the TLS data for the python core for those who want to use that to single file exe their python code as well, but is another example of how it matches the behavior of the normal LoadLibrary function by how it skips properly allocating and initializing the TLS.

@zooba
Copy link
Member

zooba commented Jan 21, 2025

This issue with this is that according to the Microsoft docs: If a DLL uses "__declspec(thread)" and might be loaded using LoadLibrary it must properly allocate and initialize it's TLS data inside of it's DLLMain. which is what I have read on one of it's doc pages.

The issue with that is that since Python 3.12 where __declspec(thread) started getting used is that it broke LoadLibrary for loading the dll.

The DLLMain provided by MSVC will always initialise __declspec(thread) variables, it doesn't rely on the loader to do it. If they aren't being initialised, you are probably loading with the "as resource file" option or some other way that loads without executing the main function properly.

@AraHaan
Copy link
Contributor Author

AraHaan commented Jan 22, 2025

This issue with this is that according to the Microsoft docs: If a DLL uses "__declspec(thread)" and might be loaded using LoadLibrary it must properly allocate and initialize it's TLS data inside of it's DLLMain. which is what I have read on one of it's doc pages.
The issue with that is that since Python 3.12 where __declspec(thread) started getting used is that it broke LoadLibrary for loading the dll.

The DLLMain provided by MSVC will always initialise __declspec(thread) variables, it doesn't rely on the loader to do it. If they aren't being initialised, you are probably loading with the "as resource file" option or some other way that loads without executing the main function properly.

I am loading it with LoadLibraryA("python314.dll");.

@zooba
Copy link
Member

zooba commented Jan 22, 2025

Perhaps you can describe what isn't working for you?

All the documentation and source code suggests that __declspec(thread) should be fine, so if you describe your actual problem, we might be able to figure out that it's related to something else.

@AraHaan
Copy link
Contributor Author

AraHaan commented Jan 28, 2025

So, I think I have an option, however it would require me to modify pcbuild.sln, pyproject.props, and python.props to add static-debug and static-release configurations that:

  • defines Py_NO_ENABLED_SHARED in pythoncore.vcxproj and all the default C extensions when these configurations are set and emit a static library (*.lib files) for all pyd's, their supporting dll's, and the python core itself (without any __declspec(dllexport)'s).
    • This is to avoid the need to load the python core at runtime and any C extensions (I could drop the usage of py2exe's zipextimporter and the _memorymodule extension code that I link into my exe for loading pyd files at runtime (that works until I load the python core manually at runtime)).
  • Modify Python.h to check if both Py_NO_ENABLE_SHARED and Py_LIMITED_API is defined and undefine Py_LIMITED_API prior to including any other header files.
    • This is because I feel that if Py_NO_ENABLE_SHARED is defined, there is really no point in having Py_LIMITED_API defined as the python core would get static linked inside of an exe with such an extension.
  • Modify pyconfig.h to pragma comment(lib the pythonX.Y_static lib file for each arch that is supported from pcbuild.sln when Py_NO_ENABLE_SHARED is defined (I honestly think that the point of Py_NO_ENABLE_SHARED is for advanced embedding anyways).
  • Modify the building of the externals to build the static versions of tcl/tk, ssl, ffi, crypto, and zlib1 (so _tkinter can be static linkable into single file exe to produce a cpython exe installer method based on tkinter which can be used to officially install python in the future for Windows). For the tcl directory files, the extension could fallback to finding the specific tcl files via OpenCodeHook when reading from a zip file (when placed inside of pythonX.Y.zip).

Note: if the static configurations is approved, there will never be any shipping of prebuilt lib files of such configuration as I feel it is only for advanced embedding, as such they can simply add the cpython repository as an external to their codebase and build the static configurations themselves.

This works great, unless I just so happen to depend on packages such as aiohttp which can have C extension modules which gets built and then loaded from within the package folder. The only solution to that one would be to:

  • PyImport_AppendInitTab("aiohttp.<extension name>", PyInit_<extension name>);
    • Issue with this is that there is no clear documentation if such a thing is supported currently with that function, just so relative imports like from .<extension name> import * can work without raising an ImportError and use the extension that is static linked into the exe and causing a failure to use site-packages that package extensions like that (it would basically make the package "think" the c extension is within the folder of the actual package code).
  • A way to patch pip and the setuptools backend to also emit a static .lib file of such packages with C extensions that it normally builds on top of any pyd files that gets outputted for static linkage.

As long as I can static link in every pyd file that is possible for me to load along with the python core itself, my code should work without problems as a single file exe.

@zooba
Copy link
Member

zooba commented Jan 30, 2025

We'd love to have static lib build configurations, though as you've recognised it's a significant project. If you want to take it on, our requirements would be that it has to work from a command-line build (i.e. build.bat - building inside Visual Studio is optional), shouldn't interfere with a regular build in the same source tree, and doesn't require any build-time modifications of the header files. Ideally the extension modules would all be optional. Beyond that, we'd have to see the changes involved to judge whether they're going to impact existing users or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-windows type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

3 participants