Skip to content

Commit c87d768

Browse files
committed
Starting docs
1 parent 426781e commit c87d768

File tree

4 files changed

+71
-39
lines changed

4 files changed

+71
-39
lines changed

README.rst renamed to README.md

+60-37
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,33 @@
1-
OrbitalML
2-
=======
1+
# OrbitalML
32

43
Convert SKLearn pipelines into SQL queries for execution in a database
54
without the need for a Python environment.
65

7-
See `examples` directory for example pipelines.
6+
See `examples` directory for example
7+
pipelines.
88

9-
**Warning**::
9+
**Warning**:
1010

1111
This is a work in progress.
1212
You might encounter bugs or missing features.
1313

14-
**Note**::
14+
**Note**:
1515

1616
Not all transformations and models can be represented as SQL queries,
1717
so OrbitalML might not be able to implement the specific pipeline you are using.
1818

19-
Getting Started
20-
----------------
19+
## Getting Started
2120

22-
Install OrbitalML::
21+
Install OrbitalML:
2322

23+
```bash
2424
$ git clone https://github.com/posit-dev/orbital.git
2525
$ pip install ./orbital
26+
```
2627

27-
Prepare some data::
28+
Prepare some data:
2829

30+
```python
2931
from sklearn.datasets import load_iris
3032
from sklearn.model_selection import train_test_split
3133

@@ -40,9 +42,11 @@ Prepare some data::
4042
X_train, X_test, y_train, y_test = train_test_split(
4143
iris_x, iris.target, test_size=0.2, random_state=42
4244
)
45+
```
4346

44-
Define a Scikit-Learn pipeline and train it::
47+
Define a Scikit-Learn pipeline and train it:
4548

49+
```python
4650
from sklearn.compose import ColumnTransformer
4751
from sklearn.linear_model import LinearRegression
4852
from sklearn.pipeline import Pipeline
@@ -56,9 +60,11 @@ Define a Scikit-Learn pipeline and train it::
5660
]
5761
)
5862
pipeline.fit(X_train, y_train)
63+
```
5964

60-
Convert the pipeline to OrbitalML::
65+
Convert the pipeline to OrbitalML:
6166

67+
```python
6268
import orbitalml
6369
import orbitalml.types
6470

@@ -68,9 +74,11 @@ Convert the pipeline to OrbitalML::
6874
"petal_length": orbitalml.types.DoubleColumnType(),
6975
"petal_width": orbitalml.types.DoubleColumnType(),
7076
})
77+
```
7178

72-
You can print the pipeline to see the result::
79+
You can print the pipeline to see the result:
7380

81+
```python
7482
>>> print(orbitalml_pipeline)
7583

7684
ParsedPipeline(
@@ -104,73 +112,88 @@ You can print the pipeline to see the result::
104112
)
105113
],
106114
)
115+
``` 
107116

108-
Now we can generate the SQL from the pipeline::
117+
Now we can generate the SQL from the pipeline:
109118

119+
```python
110120
sql = orbitalml.export_sql("DATA_TABLE", orbitalml_pipeline, dialect="duckdb")
121+
```
111122

112-
And check the resulting query::
123+
And check the resulting query:
113124

125+
```python
114126
>>> print(sql)
115127

116128
SELECT ("t0"."sepal_length" - 5.809166666666666) * -0.11633479416518255 + 0.9916666666666668 +
117129
("t0"."sepal_width" - 3.0616666666666665) * -0.05977785171980231 +
118130
("t0"."petal_length" - 3.7266666666666666) * 0.25491374699772246 +
119131
("t0"."petal_width" - 1.1833333333333333) * 0.5475959809777828
120132
AS "variable" FROM "DATA_TABLE" AS "t0"
133+
``` 
121134

122-
Once the SQL is generate, you can use it to run the pipeline on a database.
123-
From here on the SQL can be exported and reused in other places::
135+
Once the SQL is generate, you can use it to run the pipeline on a
136+
database. From here on the SQL can be exported and reused in other
137+
places:
124138

139+
```python
125140
>>> print("\nPrediction with SQL")
126141
>>> duckdb.register("DATA_TABLE", X_test)
127142
>>> print(duckdb.sql(sql).df()["variable"][:5].to_numpy())
128143

129144
Prediction with SQL
130145
[ 1.23071715 -0.04010441 2.21970287 1.34966889 1.28429336]
146+
```
131147

132148
We can verify that the prediction matches the one done by Scikit-Learn
133-
by running the scikitlearn pipeline on the same set of data::
149+
by running the scikitlearn pipeline on the same set of data:
134150

151+
```python
135152
>>> print("\nPrediction with SciKit-Learn")
136153
>>> print(pipeline.predict(X_test)[:5])
137154

138155
Prediction with SciKit-Learn
139156
[ 1.23071715 -0.04010441 2.21970287 1.34966889 1.28429336 ]
157+
``` 
140158

141-
Supported Models
142-
-----------------
159+
## Supported Models
143160

144161
OrbitalML currently supports the following models:
145162

146-
- Linear Regression
147-
- Logistic Regression
148-
- Lasso Regression
149-
- Elastic Net
150-
- Decision Tree Regressor
151-
- Decision Tree Classifier
152-
- Random Forest Classifier
153-
- Gradient Boosting Regressor
154-
- Gradient Boosting Classifier
163+
- Linear Regression
164+
- Logistic Regression
165+
- Lasso Regression
166+
- Elastic Net
167+
- Decision Tree Regressor
168+
- Decision Tree Classifier
169+
- Random Forest Classifier
170+
- Gradient Boosting Regressor
171+
- Gradient Boosting Classifier
155172

156-
Testing
157-
-------
173+
# Testing
158174

159-
Setup testing environment::
175+
Setup testing environment:
160176

177+
```bash
161178
$ uv sync --no-dev --extra test
179+
```
162180

163-
Run Tests::
181+
Run Tests:
164182

183+
```bash
165184
$ uv run pytest -v
185+
```
166186

167-
Try Examples::
187+
Try Examples:
168188

189+
```bash
169190
$ uv run examples/pipeline_lineareg.py
191+
```
170192

171-
Development
172-
-----------
193+
# Development
173194

174-
Setup a development environment::
195+
Setup a development environment:
175196

176-
$ uv sync --dev
197+
```bash
198+
$ uv sync
199+
```

docs/docs/index.md

+3
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# Welcome to Orbital
2+
3+
{!../README.md!}

docs/mkdocs.yml

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
site_name: Orbital
2+
site_url: https://posit-dev.github.io/orbital
3+
theme:
4+
name: material

pyproject.toml

+4-2
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ name = "OrbitalML"
1010
version = "0.1.0"
1111
description = "Allow SKLearn predictions to run on database systems in pure SQL."
1212
keywords = ["database", "machine learning", "sql"]
13-
readme = { file = "README.rst", content-type = "text/x-rst" }
13+
readme = { file = "README.md", content-type = "text/markdown" }
1414
license = { file = "LICENSE.md" }
1515
authors = [
1616
{ name = "Alessandro Molina", email = "[email protected]" },
@@ -64,7 +64,9 @@ dev-dependencies = [
6464
"mypy>=1.11.2",
6565
"pre-commit",
6666
"ruff>=0.6.3",
67-
"sphinx",
67+
"mkdocs-material",
68+
"mkdocstrings",
69+
"mkdocstrings-python",
6870
"pydot",
6971
"onnxruntime",
7072
"onnxscript",

0 commit comments

Comments
 (0)