Skip to content

Commit 659a546

Browse files
committed
#382 import loop
1 parent 26b2370 commit 659a546

21 files changed

+168
-44
lines changed

README.md

+4-4
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ This method uses Docker to run the complete application stack.
3232
> **Note**
3333
> When running locally, you may need to update one of the ports in the `.env` file if it conflicts with another application on your machine.
3434
35-
3. Build and run the project with `docker-compose build && docker-compose up -d && docker-compose logs -f`
35+
3. Build and run the project with `docker compose build && docker compose up -d && docker compose logs -f`
3636

3737
## Installation (Frontend Only)
3838

@@ -57,15 +57,15 @@ You'll need to replace `police-data-trust-api-1` with the name of the container
5757
docker container ls
5858
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5959
c0cf******** police-data-trust-api "/bin/sh -c '/wait &…" About a minute ago Up About a minute 0.0.0.0:5001->5001/tcp police-data-trust-api-1
60-
5e6f******** postgres:16.1 "docker-entrypoint.s…" 3 days ago Up About a minute 0.0.0.0:5432->5432/tcp police-data-trust-db-1
60+
5e6f******** postgres:16 "docker-entrypoint.s…" 3 days ago Up About a minute 0.0.0.0:5432->5432/tcp police-data-trust-db-1
6161
dacd******** police-data-trust-web "docker-entrypoint.s…" 3 days ago Up About a minute 0.0.0.0:3000->3000/tcp police-data-trust-web-1
6262
```
6363

6464
### Backend Tests
6565

6666
The current backend tests can be found in the GitHub Actions workflow file [python-tests.yml](https://github.com/codeforboston/police-data-trust/blob/0488d03c2ecc01ba774cf512b1ed2f476441948b/.github/workflows/python-tests.yml)
6767

68-
To run the tests locally, first start the application with docker-compose. Then open up a command line interface to the running container:
68+
To run the tests locally, first start the application with docker compose. Then open up a command line interface to the running container:
6969

7070
```
7171
docker exec -it "police-data-trust-api-1" /bin/bash
@@ -82,7 +82,7 @@ python -m pytest
8282

8383
The current frontend tests can be found in the GitHub Actions workflow file [frontend-checks.yml](https://github.com/codeforboston/police-data-trust/blob/0488d03c2ecc01ba774cf512b1ed2f476441948b/.github/workflows/frontend-checks.yml)
8484

85-
To run the tests locally, first start the application with docker-compose. Then open up a command line interface to the running container:
85+
To run the tests locally, first start the application with dockerccompose. Then open up a command line interface to the running container:
8686

8787
```
8888
docker exec -it "police-data-trust-web-1" /bin/bash

backend/Dockerfile.cloud

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ RUN arch=$(arch) && \
1515
file=pandas-2.2.2-cp312-cp312-manylinux_2_17_${arch}.manylinux2014_${arch}.whl && \
1616
url="https://pypi.debian.net/pandas/${file}" && \
1717
wget ${url} && \
18-
sed -i "s/pandas==1.5.3/${file}/" prod.txt
18+
sed -i "s/pandas==2.2.2/${file}/" prod.txt
1919
RUN pip install --no-cache-dir -r prod.txt
2020

2121
COPY . .

backend/api.py

+3
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,9 @@ def create_app(config: Optional[str] = None):
3333
# def _():
3434
# db.create_all()
3535

36+
# start background processor for SQS imports
37+
38+
3639
return app
3740

3841

backend/import/__init__.py

Whitespace-only changes.

backend/import/loaders/__init__.py

Whitespace-only changes.

backend/import/loop.py

+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
from io import BytesIO
2+
from logging import getLogger
3+
from time import sleep
4+
5+
import boto3
6+
import ujson
7+
8+
class Importer:
9+
def __init__(self, queue_name: str, region: str = "us-east-1"):
10+
self.queue_name = queue_name
11+
self.session = boto3.Session(region_name=region)
12+
self.sqs_client = self.session.client("sqs")
13+
self.s3_client = self.session.client("s3")
14+
self.sqs_queue_url = self.sqs_client.get_queue_url(QueueName=self.queue_name)
15+
self.logger = getLogger(self.__class__.__name__)
16+
17+
def run(self):
18+
while True:
19+
resp = self.sqs_client.receive_message(
20+
QueueUrl=self.sqs_queue_url,
21+
MaxNumberOfMessages=1, # retrieve one message at a time - we could up this and parallelize but no point until way more files.
22+
VisibilityTimeout=600, # 10 minutes to process message before it becomes visible for another consumer.
23+
)
24+
# if no messages found, wait 5m for next poll
25+
if len(resp["Messages"]) == 0:
26+
sleep(600)
27+
continue
28+
29+
for message in resp["Messages"]:
30+
sqs_body = ujson.loads(message["Body"])
31+
for record in sqs_body["Records"]: # this comes through as a list, but we expect one object
32+
bucket_name = record["s3"]["bucket"]["name"]
33+
key = record["s3"]["object"]["key"]
34+
with BytesIO() as fileobj:
35+
self.s3_client.download_fileobj(bucket_name, key, fileobj)
36+
fileobj.seek(0)
37+
content = fileobj.read()
38+
39+
# TODO: we now have an in-memory copy of the s3 file content. This is where we would run the import.
40+
# we want a standardized importer class; we would call something like below:
41+
# loader = Loader(content).load()
42+
43+
self.logger.info(f"Imported s3://{bucket_name}/{key}")
44+
45+
class Loader:
46+
def __init__(self, content: bytes):
47+
self.content = content
48+
49+
def load(self):
50+
raise Exception("unimplemented; extend this class to write a load migration.")

backend/scraper/data_scrapers/README.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,10 @@ You can also run the scraper in Docker:
1414

1515
```bash
1616
# From the base of the repository
17-
docker-compose build api
18-
docker-compose run -u $(id -u) api flask scrape
17+
docker compose build api
18+
docker compose run -u $(id -u) api flask scrape
1919
# Stop the database service
20-
docker-compose down
20+
docker compose down
2121
```
2222

2323
You may see several warnings about mixed types. The script could also take several minutes.

backend/scraper/notebooks/cpdp.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@
2929
"\n",
3030
"```bash\n",
3131
"# Stop services and remove volumes, rebuild images, start the database, create tables, run seeds, and follow logs\n",
32-
"docker-compose down -v && docker-compose up --build -d db api && docker-compose logs -f\n",
32+
"docker compose down -v && docker compose up --build -d db api && docker compose logs -f\n",
3333
"```\n",
3434
"\n",
3535
"Then open the notebook with either [VSCode](https://code.visualstudio.com/) or `jupyter notebook`.\n",

backend/scraper/notebooks/mpv.ipynb

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
"\n",
2424
"```bash\n",
2525
"# Stop services and remove volumes, rebuild images, start the database, create tables, run seeds, and follow logs\n",
26-
"docker-compose down -v && docker-compose up --build -d db api && docker-compose logs -f\n",
26+
"docker compose down -v && docker compose up --build -d db api && docker compose logs -f\n",
2727
"```\n",
2828
"\n",
2929
"Then open the notebook with either [VSCode](https://code.visualstudio.com/) or `jupyter notebook`.\n",

docker-compose.notebook.yml

-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
version: "3"
21
services:
32
api:
43
command: bash -c '/wait && flask psql create && flask psql init && jupyter notebook --allow-root --ip=0.0.0.0 --port=8889'

docker-compose.yml

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
1-
version: "3"
21
services:
32
db:
4-
image: postgres:16.2 #AWS RDS latest version
3+
image: postgres:16 #AWS RDS latest version
54
env_file:
65
- ".env"
76
volumes:

requirements/Dockerfile

+3-3
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
# requirements, so this image starts with the same image as the database
33
# containers and installs the same version of python as the api containers
44

5-
FROM postgres:16.2 as base
5+
FROM postgres:16 as base
66

77
RUN apt-get update && apt-get install -y \
88
make build-essential libssl-dev zlib1g-dev libbz2-dev libreadline-dev \
@@ -15,9 +15,9 @@ SHELL ["bash", "-lc"]
1515
RUN curl https://pyenv.run | bash && \
1616
echo 'export PATH="$HOME/.pyenv/shims:$HOME/.pyenv/bin:$PATH"' >> ~/.bashrc
1717

18-
ENV PYTHON_VERSION=3.12.3
18+
ENV PYTHON_VERSION=3.12.4
1919
RUN pyenv install ${PYTHON_VERSION} && pyenv global ${PYTHON_VERSION}
20-
RUN pip install pip-tools
20+
RUN pip install -U pip-tools
2121

2222
COPY . requirements/
2323

requirements/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ python -m pip install -r requirements/dev_unix.txt
2020

2121
```bash
2222
cd requirements
23-
docker-compose up --build --force-recreate
23+
docker compose up --build --force-recreate
2424
```
2525

2626
If you run the application natively, first install the pip-compile tool:

requirements/_core.in

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
bcrypt==3.2.2
22
black
3+
boto3
34
celery
45
flake8
56
flask
@@ -35,4 +36,5 @@ numpy
3536
spectree
3637
jupyter
3738
mixpanel
38-
ua-parser
39+
ua-parser
40+
ujson

requirements/dev_unix.txt

+16
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,12 @@ bleach==6.1.0
4444
# via nbconvert
4545
blinker==1.7.0
4646
# via flask-mail
47+
boto3==1.34.133
48+
# via -r requirements/_core.in
49+
botocore==1.34.133
50+
# via
51+
# boto3
52+
# s3transfer
4753
build==1.2.1
4854
# via pip-tools
4955
celery==5.3.6
@@ -186,6 +192,10 @@ jinja2==3.1.3
186192
# jupyterlab
187193
# jupyterlab-server
188194
# nbconvert
195+
jmespath==1.0.1
196+
# via
197+
# boto3
198+
# botocore
189199
json5==0.9.25
190200
# via jupyterlab-server
191201
jsonpointer==2.4
@@ -405,6 +415,7 @@ pytest-postgresql==5.1.0
405415
python-dateutil==2.9.0
406416
# via
407417
# arrow
418+
# botocore
408419
# celery
409420
# jupyter-client
410421
# pandas
@@ -451,6 +462,8 @@ rpds-py==0.18.0
451462
# via
452463
# jsonschema
453464
# referencing
465+
s3transfer==0.10.2
466+
# via boto3
454467
send2trash==1.8.2
455468
# via jupyter-server
456469
six==1.16.0
@@ -528,10 +541,13 @@ tzdata==2024.1
528541
# pandas
529542
ua-parser==0.18.0
530543
# via -r requirements/_core.in
544+
ujson==5.10.0
545+
# via -r requirements/_core.in
531546
uri-template==1.3.0
532547
# via jsonschema
533548
urllib3==1.26.18
534549
# via
550+
# botocore
535551
# mixpanel
536552
# requests
537553
vine==5.1.0

requirements/dev_windows.txt

+16
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,12 @@ bleach==6.1.0
4444
# via nbconvert
4545
blinker==1.7.0
4646
# via flask-mail
47+
boto3==1.34.133
48+
# via -r requirements/_core.in
49+
botocore==1.34.133
50+
# via
51+
# boto3
52+
# s3transfer
4753
build==1.2.1
4854
# via pip-tools
4955
celery==5.3.6
@@ -186,6 +192,10 @@ jinja2==3.1.3
186192
# jupyterlab
187193
# jupyterlab-server
188194
# nbconvert
195+
jmespath==1.0.1
196+
# via
197+
# boto3
198+
# botocore
189199
json5==0.9.25
190200
# via jupyterlab-server
191201
jsonpointer==2.4
@@ -405,6 +415,7 @@ pytest-postgresql==5.1.0
405415
python-dateutil==2.9.0
406416
# via
407417
# arrow
418+
# botocore
408419
# celery
409420
# jupyter-client
410421
# pandas
@@ -451,6 +462,8 @@ rpds-py==0.18.0
451462
# via
452463
# jsonschema
453464
# referencing
465+
s3transfer==0.10.2
466+
# via boto3
454467
send2trash==1.8.2
455468
# via jupyter-server
456469
six==1.16.0
@@ -528,10 +541,13 @@ tzdata==2024.1
528541
# pandas
529542
ua-parser==0.18.0
530543
# via -r requirements/_core.in
544+
ujson==5.10.0
545+
# via -r requirements/_core.in
531546
uri-template==1.3.0
532547
# via jsonschema
533548
urllib3==1.26.18
534549
# via
550+
# botocore
535551
# mixpanel
536552
# requests
537553
vine==5.1.0

requirements/docker-compose.yml

-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
version: "3"
21
services:
32
pip-compile:
43
build:

0 commit comments

Comments
 (0)