Description
Hi!
This issue is going to be a summary of my prototypes to generate Python Interface files ('stub files', '.pyi files') automatically. The prototypes are available as #2379 and #2447.
- [Prototype] Python stubs generation #2379 aimed to generate Python stub files entirely at compile-time. This is not possible because proc-macros run before type inference and trait analysis, so proc-macros cannot know if a type implements a trait or not.
- Prototype: Runtime inspection and Pyi generation #2447 aims to generate information at compile-time that represents the structure of
#[pyclass]
structs (which methods exist, what arguments do they accept) to be read at run-time by the stub generator.
I'm presenting the results here to get feedback on the current approach. I'm thinking of extracting parts of the prototypes as standalone features and PRs.
Progress
Accessing type information at runtime:
- Runtime type information for objects crossing the Rust–Python boundary #2490
- Generate precise type information as part of
#[pyclass]
Accessing structural information at runtime:
- Declare the inspection API
- Generate the inspection API structures as part of
#[pyclass]
and#[pymethods]
- Collect the inspection data per module
Python interface generation:
- Generate the Python interface files
- Document how to generate PYI files in end user's projects
Summary
The final goal is to provide a way for developers who use PyO3 to automatically generate Python Interface files (.pyi) with type information and documentation, to enable Rust extensions to 'feel' like regular Python code for end users via proper integration in various tools (MyPy, IDEs, Python documentation generators).
I have identified the following steps to achieve this goal. Ideally, each step will become its own PR as a standalone feature.
- provide a way to extract the full Python type information from any object passed to/retrieved from Python (e.g.
List[Union[str]]
, not justPyList
). - provide an API to describe Python objects at run-time (list of classes, list of methods for these classes, list of arguments of each method, etc).
- improve the macros so they generate at compile-time the various inspection data structures (the API from 2.)
- write a run-time pyi generator based on the inspection API
1 and 2 are independent, 3 and 4 are independent.
Full type information
The goal of this task is to provide a simple way to access the string representation of the Python type of any object exposed to Python. This string representation should follow the exact format of normal Python type hints.
First, a structure representing the various types is created (simplified version below, prototype here):
struct TypeInfo {
Any,
None,
Optional(Box<TypeInfo>),
...
Builtin(&str),
Class {
module: Option<&str>,
name: &str,
}
}
impl Display for TypeInfo {
// Convert to a proper String
}
PyO3 already has traits that represent conversion to/from Python: IntoPy
and FromPyObject
. These traits can be enhanced to return the type information. The Python convention is that all untyped values should be considered as Any
, so the methods can be added with Any
as a default to avoid breaking changes (simplified version below, prototype here):
pub trait IntoPy<T> {
// current API omitted
fn type_output() -> TypeInfo {
TypeInfo::Any
}
}
pub trait FromPyObject {
// current API omitted
fn type_input() -> TypeInfo {
TypeInfo::Any
}
}
The rationale for adding two different methods is:
- Some structs implement one trait but not the other (e.g. enums which use
derive(FromPyObject)
), so adding the method to only one of the trait would not work in all cases, - Creating a new trait with a single method would be inconvenient for PyO3 users in general, as it would mean implementing one more trait for each Python-exposed object
- Both methods have a sensible default, and are both trivial to implement so I don't believe there are any downsides,
- Some Python classes should have a different type when appearing as a function input and output, for example
Mapping<K, V>
as input andDict<K, V>
as output. Using two different methods supports this use case out-of-the-box.
After this is implemented for built-in types (prototype here), using them becomes as easy as format!("The type of this value is {}", usize::type_input())
which gives "The type of this value is int"
.
Inspection API
This section consists of creating an API to represent Python objects.
The main entry point for users would be the InspectClass
trait (simplified, prototype here):
pub trait InspectClass {
fn inspect() -> ClassInfo;
}
A similar trait would be created for modules, so it becomes possible to access the list of classes in a module.
This requires creating a structure for each Python language element (ModuleInfo
, ClassInfo
, FieldInfo
, ArgumentInfo
…, prototype here).
At this point, using this API would require instantiating all structures by hand.
Compile-time generation
Proc-macros can statically generate all information needed to automatically implement the inspection API: structural information (fields, etc) are already known, and type information can simply be delegated to the IntoPy
and FromPyObject
traits, since all parameters and return values must implement at least one of them.
Various prototypes:
- 38f0a59: extract classes
- 56b85cf: extract the list of functions
- 8125521: extract a function's kind (function, class method, static method…)
- 4070ad4: extract the function's return type,
- 53f2e94: extract attributes annotated with
#[pyo3(get, set)]
, - 003d275: extract argument names and type
This is done via two new traits, InspectStruct
, InspectImpl
which respectively contain the information captured from #[pyclass]
and #[pymethods]
. Due to this, this prototype is not compatible with multiple-pymethods
. I do not know whether it is possible to make it compatible in the future.
Python Interface generator
Finally, a small runtime routine can be provided to generate the .pyi file from the compile-time extracted information (prototype here).
Thanks to the previous steps, it is possible to retrieve all information necessary to create a complete typed interface file with no further annotations from a user of the PyO3 library. I think that's pretty much the perfect scenario for this feature, and although it seemed daunting at first, I don't think it's so far fetched now 😄
The current state of the prototype is described here: #2447 (comment).