1
- OrbitalML
2
- =======
1
+ # OrbitalML
3
2
4
3
Convert SKLearn pipelines into SQL queries for execution in a database
5
4
without the need for a Python environment.
6
5
7
- See `examples ` directory for example pipelines.
6
+ See ` examples ` directory for example
7
+ pipelines.
8
8
9
- **Warning **::
9
+ ** Warning** :
10
10
11
11
This is a work in progress.
12
12
You might encounter bugs or missing features.
13
13
14
- **Note **::
14
+ ** Note** :
15
15
16
16
Not all transformations and models can be represented as SQL queries,
17
17
so OrbitalML might not be able to implement the specific pipeline you are using.
18
18
19
- Getting Started
20
- ----------------
19
+ ## Getting Started
21
20
22
- Install OrbitalML::
21
+ Install OrbitalML:
23
22
23
+ ``` bash
24
24
$ git clone https://github.com/posit-dev/orbital.git
25
25
$ pip install ./orbital
26
+ ```
26
27
27
- Prepare some data::
28
+ Prepare some data:
28
29
30
+ ``` python
29
31
from sklearn.datasets import load_iris
30
32
from sklearn.model_selection import train_test_split
31
33
@@ -40,9 +42,11 @@ Prepare some data::
40
42
X_train, X_test, y_train, y_test = train_test_split(
41
43
iris_x, iris.target, test_size = 0.2 , random_state = 42
42
44
)
45
+ ```
43
46
44
- Define a Scikit-Learn pipeline and train it::
47
+ Define a Scikit-Learn pipeline and train it:
45
48
49
+ ``` python
46
50
from sklearn.compose import ColumnTransformer
47
51
from sklearn.linear_model import LinearRegression
48
52
from sklearn.pipeline import Pipeline
@@ -56,9 +60,11 @@ Define a Scikit-Learn pipeline and train it::
56
60
]
57
61
)
58
62
pipeline.fit(X_train, y_train)
63
+ ```
59
64
60
- Convert the pipeline to OrbitalML::
65
+ Convert the pipeline to OrbitalML:
61
66
67
+ ``` python
62
68
import orbitalml
63
69
import orbitalml.types
64
70
@@ -68,9 +74,11 @@ Convert the pipeline to OrbitalML::
68
74
" petal_length" : orbitalml.types.DoubleColumnType(),
69
75
" petal_width" : orbitalml.types.DoubleColumnType(),
70
76
})
77
+ ```
71
78
72
- You can print the pipeline to see the result::
79
+ You can print the pipeline to see the result:
73
80
81
+ ``` python
74
82
>> > print (orbitalml_pipeline)
75
83
76
84
ParsedPipeline(
@@ -104,73 +112,88 @@ You can print the pipeline to see the result::
104
112
)
105
113
],
106
114
)
115
+ ```
107
116
108
- Now we can generate the SQL from the pipeline::
117
+ Now we can generate the SQL from the pipeline:
109
118
119
+ ```python
110
120
sql = orbitalml.export_sql(" DATA_TABLE" , orbitalml_pipeline, dialect = " duckdb" )
121
+ ```
111
122
112
- And check the resulting query::
123
+ And check the resulting query:
113
124
125
+ ```python
114
126
>> > print (sql)
115
127
116
128
SELECT (" t0" ." sepal_length" - 5.809166666666666 ) * - 0.11633479416518255 + 0.9916666666666668 +
117
129
(" t0" ." sepal_width" - 3.0616666666666665 ) * - 0.05977785171980231 +
118
130
(" t0" ." petal_length" - 3.7266666666666666 ) * 0.25491374699772246 +
119
131
(" t0" ." petal_width" - 1.1833333333333333 ) * 0.5475959809777828
120
132
AS " variable" FROM " DATA_TABLE" AS " t0"
133
+ ```
121
134
122
- Once the SQL is generate, you can use it to run the pipeline on a database.
123
- From here on the SQL can be exported and reused in other places::
135
+ Once the SQL is generate, you can use it to run the pipeline on a
136
+ database. From here on the SQL can be exported and reused in other
137
+ places:
124
138
139
+ ```python
125
140
>> > print (" \n Prediction with SQL" )
126
141
>> > duckdb.register(" DATA_TABLE" , X_test)
127
142
>> > print (duckdb.sql(sql).df()[" variable" ][:5 ].to_numpy())
128
143
129
144
Prediction with SQL
130
145
[ 1.23071715 - 0.04010441 2.21970287 1.34966889 1.28429336 ]
146
+ ```
131
147
132
148
We can verify that the prediction matches the one done by Scikit- Learn
133
- by running the scikitlearn pipeline on the same set of data::
149
+ by running the scikitlearn pipeline on the same set of data:
134
150
151
+ ```python
135
152
>> > print (" \n Prediction with SciKit-Learn" )
136
153
>> > print (pipeline.predict(X_test)[:5 ])
137
154
138
155
Prediction with SciKit- Learn
139
156
[ 1.23071715 - 0.04010441 2.21970287 1.34966889 1.28429336 ]
157
+ ```
140
158
141
- Supported Models
142
- -----------------
159
+ # # Supported Models
143
160
144
161
OrbitalML currently supports the following models:
145
162
146
- - Linear Regression
147
- - Logistic Regression
148
- - Lasso Regression
149
- - Elastic Net
150
- - Decision Tree Regressor
151
- - Decision Tree Classifier
152
- - Random Forest Classifier
153
- - Gradient Boosting Regressor
154
- - Gradient Boosting Classifier
163
+ - Linear Regression
164
+ - Logistic Regression
165
+ - Lasso Regression
166
+ - Elastic Net
167
+ - Decision Tree Regressor
168
+ - Decision Tree Classifier
169
+ - Random Forest Classifier
170
+ - Gradient Boosting Regressor
171
+ - Gradient Boosting Classifier
155
172
156
- Testing
157
- -------
173
+ # Testing
158
174
159
- Setup testing environment::
175
+ Setup testing environment:
160
176
177
+ ```bash
161
178
$ uv sync -- no- dev -- extra test
179
+ ```
162
180
163
- Run Tests::
181
+ Run Tests:
164
182
183
+ ```bash
165
184
$ uv run pytest - v
185
+ ```
166
186
167
- Try Examples::
187
+ Try Examples:
168
188
189
+ ```bash
169
190
$ uv run examples/ pipeline_lineareg.py
191
+ ```
170
192
171
- Development
172
- -----------
193
+ # Development
173
194
174
- Setup a development environment::
195
+ Setup a development environment:
175
196
176
- $ uv sync --dev
197
+ ```bash
198
+ $ uv sync
199
+ ```
0 commit comments