Skip to content

Commit c2233e2

Browse files
evantahlerjatinyadav-cc
authored andcommitted
Update metadata-service to latest version + docs (airbytehq#35419)
1 parent bb01db1 commit c2233e2

File tree

3 files changed

+91
-28
lines changed

3 files changed

+91
-28
lines changed

airbyte-ci/connectors/metadata_service/lib/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "metadata-service"
3-
version = "0.3.3"
3+
version = "0.3.4"
44
description = ""
55
authors = ["Ben Church <[email protected]>"]
66
readme = "README.md"

airbyte-ci/connectors/metadata_service/orchestrator/README.md

Lines changed: 67 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
# Connector Orchestrator
2-
This is the Orchestrator for Airbyte metadata built on Dagster.
32

3+
This is the Orchestrator for Airbyte metadata built on Dagster.
44

55
# Setup
66

77
## Prerequisites
88

99
#### Poetry
1010

11-
Before you can start working on this project, you will need to have Poetry installed on your system. Please follow the instructions below to install Poetry:
11+
Before you can start working on this project, you will need to have Poetry installed on your system.
12+
Please follow the instructions below to install Poetry:
1213

1314
1. Open your terminal or command prompt.
1415
2. Install Poetry using the recommended installation method:
@@ -23,125 +24,165 @@ Alternatively, you can use `pip` to install Poetry:
2324
pip install --user poetry
2425
```
2526

26-
3. After the installation is complete, close and reopen your terminal to ensure the newly installed `poetry` command is available in your system's PATH.
27+
3. After the installation is complete, close and reopen your terminal to ensure the newly installed
28+
`poetry` command is available in your system's PATH.
2729

28-
For more detailed instructions and alternative installation methods, please refer to the official Poetry documentation: https://python-poetry.org/docs/#installation
30+
For more detailed instructions and alternative installation methods, please refer to the official
31+
Poetry documentation: https://python-poetry.org/docs/#installation
2932

3033
### Using Poetry in the Project
3134

32-
Once Poetry is installed, you can use it to manage the project's dependencies and virtual environment. To get started, navigate to the project's root directory in your terminal and follow these steps:
33-
35+
Once Poetry is installed, you can use it to manage the project's dependencies and virtual
36+
environment. To get started, navigate to the project's root directory in your terminal and follow
37+
these steps:
3438

3539
## Installation
40+
3641
```bash
3742
poetry install
3843
cp .env.template .env
3944
```
4045

4146
## Create a GCP Service Account and Dev Bucket
47+
4248
Developing against the orchestrator requires a development bucket in GCP.
4349

4450
The orchestrator will use this bucket to:
51+
4552
- store important output files. (e.g. Reports)
4653
- watch for changes to the `registry` directory in the bucket.
4754

4855
However all tmp files will be stored in a local directory.
4956

5057
To create a development bucket:
58+
5159
1. Create a GCP Service Account with the following permissions:
52-
- Storage Admin
53-
- Storage Object Admin
54-
- Storage Object Creator
55-
- Storage Object Viewer
60+
- Storage Admin
61+
- Storage Object Admin
62+
- Storage Object Creator
63+
- Storage Object Viewer
5664
2. Create a PUBLIC GCS bucket
5765
3. Add the service account as a member of the bucket with the following permissions:
58-
- Storage Admin
59-
- Storage Object Admin
60-
- Storage Object Creator
61-
- Storage Object Viewer
66+
67+
- Storage Admin
68+
- Storage Object Admin
69+
- Storage Object Creator
70+
- Storage Object Viewer
6271

6372
4. Add the following environment variables to your `.env` file:
64-
- `METADATA_BUCKET`
65-
- `GCS_CREDENTIALS`
73+
- `METADATA_BUCKET`
74+
- `GCS_CREDENTIALS`
6675

6776
Note that the `GCS_CREDENTIALS` should be the raw json string of the service account credentials.
6877

6978
Here is an example of how to import the service account credentials into your environment:
79+
7080
```bash
7181
export GCS_CREDENTIALS=`cat /path/to/credentials.json`
7282
```
7383

7484
## The Orchestrator
7585

76-
The orchestrator (built using Dagster) is responsible for orchestrating various the metadata processes.
86+
The orchestrator (built using Dagster) is responsible for orchestrating various the metadata
87+
processes.
88+
89+
Dagster has a number of concepts that are important to understand before working on the
90+
orchestrator.
7791

78-
Dagster has a number of concepts that are important to understand before working on the orchestrator.
7992
1. Assets
8093
2. Resources
8194
3. Schedules
8295
4. Sensors
8396
5. Ops
8497

85-
Refer to the [Dagster documentation](https://docs.dagster.io/concepts) for more information on these concepts.
98+
Refer to the [Dagster documentation](https://docs.dagster.io/concepts) for more information on these
99+
concepts.
86100

87101
### Starting the Dagster Daemons
102+
88103
Start the orchestrator with the following command:
104+
89105
```bash
90106
poetry run dagster dev
91107
```
92108

93109
Then you can access the Dagster UI at http://localhost:3000
94110

95-
Note its important to use `dagster dev` instead of `dagit` because `dagster dev` start additional services that are required for the orchestrator to run. Namely the sensor service.
111+
Note its important to use `dagster dev` instead of `dagit` because `dagster dev` start additional
112+
services that are required for the orchestrator to run. Namely the sensor service.
96113

97114
### Materializing Assets with the UI
98-
When you navigate to the orchestrator in the UI, you will see a list of assets that are available to be materialized.
115+
116+
When you navigate to the orchestrator in the UI, you will see a list of assets that are available to
117+
be materialized.
99118

100119
From here you have the following options
120+
101121
1. Materialize all assets
102122
2. Select a subset of assets to materialize
103123
3. Enable a sensor to automatically materialize assets
104124

105125
### Materializing Assets without the UI
106126

107-
In some cases you may want to run the orchestrator without the UI. To learn more about Dagster's CLI commands, see the [Dagster CLI documentation](https://docs.dagster.io/_apidocs/cli).
127+
In some cases you may want to run the orchestrator without the UI. To learn more about Dagster's CLI
128+
commands, see the [Dagster CLI documentation](https://docs.dagster.io/_apidocs/cli).
108129

109130
## Running Tests
131+
110132
```bash
111133
poetry run pytest
112134
```
113135

136+
## Deploying to Dagster Automatically
137+
138+
GitHub Actions is used to automatically deploy the orchestrator to Dagster Cloud
139+
([Github Action](https://github.com/airbytehq/airbyte/blob/master/.github/workflows/metadata_service_deploy_orchestrator_dagger.yml)).
140+
141+
1. Update the version of your code (`../lib`) and update the version of the package in
142+
`pyproject.toml`
143+
1. In this project (`../orchestrator`) Run `poetry lock --no-update` to bump the version of the
144+
requirements you may have changed in
145+
`airbyte-ci/connectors/metadata_service/orchestrator/poetry.lock`
146+
1. Push your changes to the `master` branch and the orchestrator will be automatically deployed to
147+
Dagster Cloud.
148+
114149
## Deploying to Dagster Cloud manually
115-
Note: This is a temporary solution until we have a CI/CD pipeline setup.
116150

117-
Getting the CICD setup is currently blocked until we hear back from Dagster on a better way to use relative imports in a Dagster Cloud Deployment.
151+
This should only be needed if the above (automatic deployment) fails.
118152

119153
### Installing the dagster-cloud cli
154+
120155
```bash
121156
pip install dagster-cloud
122157
dagster-cloud config
123158
```
124159

125160
### Deploying the orchestrator
161+
126162
```bash
127163
cd orchestrator
128164
DAGSTER_CLOUD_API_TOKEN=<YOU-DAGSTER-CLOUD-TOKEN> airbyte-ci metadata deploy orchestrator
129165
```
130166

131167
# Using the Orchestrator to create a Connector Registry for Development
168+
132169
The orchestrator can be used to create a connector registry for development purposes.
133170

134171
## Setup
172+
135173
First you will need to setup the orchestrator as described above.
136174

137175
Then you will want to do the following
138176

139177
### 1. Mirror the production bucket
140-
Use the Google Cloud Console to mirror the production bucket (prod-airbyte-cloud-connector-metadata-service) to your development bucket.
178+
179+
Use the Google Cloud Console to mirror the production bucket
180+
(prod-airbyte-cloud-connector-metadata-service) to your development bucket.
141181

142182
[Docs](https://cloud.google.com/storage-transfer/docs/cloud-storage-to-cloud-storage)
143183

144184
### 2. Upload any local metadata files you want to test changes with
185+
145186
```bash
146187
# assuming your terminal is in the same location as this readme
147188
cd ../lib
@@ -150,6 +191,7 @@ poetry run metadata_service upload <PATH TO METADATA FILE> <NAME OF YOUR BUCKET>
150191
```
151192

152193
### 3. Generate the registry
194+
153195
```bash
154196
poetry run dagster dev
155197
open http://localhost:3000

airbyte-ci/connectors/metadata_service/orchestrator/poetry.lock

Lines changed: 23 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)