Skip to content

Commit dae3670

Browse files
committed
Added some fixes, Session definition and tests, updated docs
1 parent c005d34 commit dae3670

12 files changed

+182
-12
lines changed

.isort.cfg

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
[settings]
22
profile=black
3-
include_trailing_comma = true
3+
include_trailing_comma = true

.pre-commit-config.yaml

+7
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,10 @@ repos:
2626
entry: |
2727
make check-codestyle
2828
language: system
29+
30+
- repo: local
31+
hooks:
32+
- id: todo-checker
33+
name: todo-checker
34+
entry: todo_checker.sh
35+
language: script

README.md

+30-1
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,8 @@ connect(hosts="localhost:9200")
6969
```python
7070
# Create and save doc
7171
user = User(name="John", age=20)
72-
user.save(wait_for=True)
72+
user.save(wait_for=True) # wait_for explained below
73+
7374
assert user.id != None
7475

7576
# Update doc
@@ -91,6 +92,24 @@ user.save(wait_for=True)
9192
user.delete(wait_for=True)
9293
```
9394

95+
### Sessions
96+
Sessions are inspired by [SQL Alchemy](https://docs.sqlalchemy.org/en/14/orm/tutorial.html)'s sessions, and are used for simplifying bulk operations using the Elasticsearch client. From what I've seen, the ES client makes it pretty hard to use the bulk API, so they created bulk helpers (which in turn have incomplete/wrong docs).
97+
98+
With an ORM, bulk operations can be exposed neatly through a simple API.
99+
```python
100+
john = User(name="John")
101+
sarah = User(name="Sarah")
102+
103+
session = Session()
104+
105+
session.save(john)
106+
session.save(sarah)
107+
session.commit()
108+
```
109+
110+
The sessions API will also be available through a context manager before the v1.0 release.
111+
112+
94113
### Dynamic Index Support
95114
Pydastic also supports dynamic index specification. The model Metaclass index definition is still mandatory, but if an index is specified when performing operations, that will be used instead.
96115
The model Metaclass index is technically a fallback, although most users will probably be using a single index per model. For some users, multiple indices per model are needed (for example one user index per company).
@@ -102,6 +121,16 @@ user.save(index="my-user", wait_for=True)
102121
user.delete(index="my-user", wait_for=True)
103122
```
104123

124+
125+
### Notes on testing
126+
When writing tests with Pydastic (even applies when writing tests with the elasticsearch client), remember to use the `wait_for=True` argument when executing operations. If this is not used, then the test will continue executing even if Elasticsearch hasn't propagated the change to all nodes, giving you weird results.
127+
128+
For example if you save a document, then try getting it directly after, you'll get a document not found error. This is solved by using the wait_for argument in Pydastic (equivalent to `refresh="wait_for"` in Elasticsearch)
129+
130+
Here is [a reference](https://elasticsearch-py.readthedocs.io/en/v8.2.0/api.html#elasticsearch.Elasticsearch.index) to where this argument is listed in the docs.
131+
132+
It's also supported in the bulk helpers even though its not mentioned in their docs, but you wouldn't figure that out unless you dug into their source and traced back several function calls where `*args` `**kwargs` are just being forwarded across calls.. :)
133+
105134
## Support Elasticsearch Versions
106135

107136
Part of the build flow is running the tests using elasticsearch 7.12.0 DB as well as python client, and using 8.1.2 as well (DB as well as client, as part of a build matrix).

pydastic/__init__.py

+16-3
Original file line numberDiff line numberDiff line change
@@ -18,8 +18,21 @@ def get_version() -> str:
1818

1919
version: str = get_version()
2020

21-
from pydastic.error import NotFoundError
21+
from pydastic.error import (
22+
InvalidElasticsearchResponse,
23+
InvalidModelError,
24+
NotFoundError,
25+
)
2226
from pydastic.model import ESModel
2327
from pydastic.pydastic import PydasticClient, connect
24-
25-
__all__ = ["ESModel", "NotFoundError", "PydasticClient", "connect"]
28+
from pydastic.session import Session
29+
30+
__all__ = [
31+
"ESModel",
32+
"Session",
33+
"NotFoundError",
34+
"InvalidModelError",
35+
"InvalidElasticsearchResponse",
36+
"PydasticClient",
37+
"connect",
38+
]

pydastic/error.py

+4
Original file line numberDiff line numberDiff line change
@@ -8,3 +8,7 @@ class IndexDoesNotFoundError(Exception):
88

99
class InvalidElasticsearchResponse(Exception):
1010
...
11+
12+
13+
class InvalidModelError(Exception):
14+
...

pydastic/model.py

+4-5
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,9 @@
33
from typing import Any, Callable, Dict, Optional, Tuple, Type, TypeVar, Union
44

55
from elasticsearch import NotFoundError as ElasticNotFoundError
6-
from pydantic import BaseModel
7-
from pydantic.main import Field, FieldInfo, ModelMetaclass
6+
from pydantic import BaseModel, Field
7+
from pydantic.fields import FieldInfo
8+
from pydantic.main import ModelMetaclass
89

910
from pydastic.error import InvalidElasticsearchResponse, NotFoundError
1011
from pydastic.pydastic import _client
@@ -187,8 +188,6 @@ def delete(self: Type[M], index: Optional[str] = None, wait_for: Optional[bool]
187188
if not self.id:
188189
raise ValueError("id missing from object")
189190

190-
doc = self.dict(exclude={"id"})
191-
192191
# Allow waiting for shards - useful when testing
193192
refresh = "false"
194193
if wait_for:
@@ -199,6 +198,6 @@ def delete(self: Type[M], index: Optional[str] = None, wait_for: Optional[bool]
199198
index = self.Meta.index
200199

201200
try:
202-
res = _client.client.delete(index=index, id=self.id, refresh=refresh)
201+
_client.client.delete(index=index, id=self.id, refresh=refresh)
203202
except ElasticNotFoundError:
204203
raise NotFoundError(f"document with id {id} not found")

pydastic/session.py

+48
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
from typing import Optional
2+
3+
from elasticsearch.helpers import bulk
4+
5+
from pydastic.error import InvalidModelError
6+
from pydastic.model import ESModel
7+
from pydastic.pydastic import _client
8+
9+
10+
class Session:
11+
def __init__(self):
12+
# Initialize state
13+
self._operations = []
14+
15+
def save(self, model: ESModel, index: Optional[str] = None):
16+
# Create save bulk operation
17+
if not index:
18+
index = model.Meta.index
19+
20+
doc = model.dict(exclude={"id"})
21+
op = {"_index": index, "_op_type": "index", **doc}
22+
23+
self._operations.append(op)
24+
25+
def update(self, model: ESModel, index: Optional[str] = None):
26+
if not index:
27+
index = model.Meta.index
28+
29+
if not model.id:
30+
raise InvalidModelError("model id property is required for update operations")
31+
32+
doc = model.dict(exclude={"id"})
33+
op = {"_id": model.id, "_index": index, "_op_type": "update", "_source": {"doc": doc}}
34+
35+
self._operations.append(op)
36+
37+
def commit(self, wait_for: Optional[bool] = False):
38+
refresh = "false"
39+
if wait_for:
40+
refresh = "wait_for"
41+
42+
results = bulk(client=_client.client, actions=self._operations, refresh=refresh)
43+
44+
# TODO: Process errors from operations
45+
pass
46+
47+
def delete(self, model: ESModel, index: Optional[str] = None):
48+
raise NotImplementedError

pyproject.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ build-backend = "poetry.core.masonry.api"
55

66
[tool.poetry]
77
name = "pydastic"
8-
version = "0.2.2"
8+
version = "0.3.0"
99
description = "Pydastic is an elasticsearch python ORM based on Pydantic."
1010
readme = "README.md"
1111
authors = ["pydastic <[email protected]>"]

tests/conftest.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,8 @@
77

88
@pytest.fixture()
99
def es() -> Elasticsearch:
10-
connect(hosts="http://localhost:9200", ssl_show_warn=False)
10+
connect(hosts="http://localhost:9200")
11+
_client.client.delete_by_query(index="_all", body={"query": {"match_all": {}}}, wait_for_completion=True, refresh=True)
1112
return _client.client
1213

1314

tests/test_model.py

+3
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,9 @@ def test_model_save_additional_fields(es: Elasticsearch):
112112
assert dict(user_dict, **extra_fields) == user_dict
113113

114114

115+
# TODO: Test partial updates (to dict, exclude = True) -> maybe provide a partial api?
116+
117+
115118
def test_model_ignores_additional_fields(es: Elasticsearch):
116119
extra_fields = {"name": "John", "location": "Seattle", "manager_ids": ["Pam", "Sam"]}
117120
res = es.index(index=User.Meta.index, body=extra_fields)

tests/test_session.py

+32
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
import pytest
2+
from elasticsearch import Elasticsearch
3+
from user import User
4+
5+
from pydastic import ESModel, Session
6+
7+
8+
def test_session_save(es: Elasticsearch):
9+
user = User(name="John")
10+
11+
session = Session()
12+
13+
session.save(user)
14+
session.commit(wait_for=True)
15+
16+
res = es.search(index=user.Meta.index, body={"query": {"match_all": {}}})
17+
assert len(res["hits"]["hits"]) == 1
18+
19+
model = user.to_es()
20+
assert res["hits"]["hits"][0]["_source"] == model
21+
22+
23+
def test_session_save_with(es: Elasticsearch):
24+
...
25+
26+
27+
def test_session_save_with_bulk_error(es: Elasticsearch):
28+
...
29+
30+
31+
def test_session_update(es: Elasticsearch):
32+
...

todo_checker.sh

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
#!/bin/bash
2+
3+
check_file() {
4+
local file=$1
5+
local match_pattern=$2
6+
7+
local file_changes_with_context=$(git diff -U999999999 -p --cached --color=always -- $file)
8+
9+
# From the diff, get the green lines starting with '+' and including '$match_pattern'
10+
local matched_additions=$(echo "$file_changes_with_context" | grep -C4 $'^\e\\[32m\+.*'"$match_pattern")
11+
12+
if [ -n "$matched_additions" ]; then
13+
echo -e "\n$file additions match '$match_pattern':\n"
14+
15+
for matched_line in $matched_additions
16+
do
17+
echo "$matched_line"
18+
done
19+
20+
echo "Not committing, because $file matches $match_pattern"
21+
exit 1
22+
fi
23+
}
24+
25+
# Actual hook logic:
26+
27+
MATCH='TODO'
28+
for file in `git diff --cached -p --name-status | cut -c3-`; do
29+
for match_pattern in $MATCH
30+
do
31+
check_file $file $match_pattern
32+
done
33+
done
34+
exit

0 commit comments

Comments
 (0)