Skip to content

Commit 0311db0

Browse files
authored
📝 Update Builder's Record Selector docs (#37752)
1 parent f890a19 commit 0311db0

File tree

1 file changed

+110
-16
lines changed

1 file changed

+110
-16
lines changed

docs/connector-development/connector-builder-ui/record-processing.mdx

+110-16
Original file line numberDiff line numberDiff line change
@@ -8,20 +8,33 @@ Connectors built with the connector builder always make HTTP requests, receive t
88
- Do optional post-processing (transformations)
99
- Provide record meta data to the system to inform downstream processes (primary key and declared schema)
1010

11-
## Record selection
11+
## Record Selection
1212

1313
<iframe
14-
width="640"
14+
width="583"
1515
height="393"
16-
src="https://www.loom.com/embed/06d0fe35d79b40c5b1aea29a7fa7f113"
16+
src="https://www.loom.com/embed/f4a36e769a1d4f87a14e3982f59d1fb2"
1717
frameborder="0"
1818
webkitallowfullscreen
1919
mozallowfullscreen
2020
allowfullscreen
2121
></iframe>
2222

23-
When doing HTTP requests, the connector expects the records to be part of the response JSON body. The "Record selector" field of the stream needs to be set to the property of the response object that holds the records.
23+
When doing HTTP requests, the connector expects the records to be part of the response JSON body. The "Record Selector" component of the stream can be used to configure how records should be extracted from the response body.
24+
25+
The Record Selector component contains a few different levers to configure this extraction:
26+
- Field Path
27+
- Record Filter
28+
- Cast Record Fields to Schema Types
29+
30+
These will be explained below.
2431

32+
### Field Path
33+
The Field Path feature lets you define a path into the fields of the response to point to the part of the response which should be treated as the record(s).
34+
35+
Below are a few different examples of what this can look like depending on the API.
36+
37+
#### Top-level key pointing to array
2538
Very often, the response body contains an array of records along with some suplementary information (for example meta data for pagination).
2639

2740
For example the ["Most popular" NY Times API](https://developer.nytimes.com/docs/most-popular-product/1/overview) returns the following response body:
@@ -50,9 +63,9 @@ For example the ["Most popular" NY Times API](https://developer.nytimes.com/docs
5063
}`}
5164
</pre>
5265

53-
**Setting the record selector to `results`** selects the array with the actual records, everything else is discarded.
66+
In this case, **setting the Field Path to `results`** selects the array with the actual records, everything else is discarded.
5467

55-
### Nested objects
68+
#### Nested array
5669

5770
In some cases the array of actual records is nested multiple levels deep in the response, like for the ["Archive" NY Times API](https://developer.nytimes.com/docs/archive-product/1/overview):
5871

@@ -77,9 +90,9 @@ In some cases the array of actual records is nested multiple levels deep in the
7790
}`}
7891
</pre>
7992

80-
**Setting the record selector needs to be set to "`response`,`docs`"** selects the nested array.
93+
In this case, **setting the Field Path to `response`,`docs`** selects the nested array.
8194

82-
### Root array
95+
#### Root array
8396

8497
In some cases, the response body itself is an array of records, like in the [CoinAPI API](https://docs.coinapi.io/market-data/rest-api/quotes):
8598

@@ -103,11 +116,11 @@ In some cases, the response body itself is an array of records, like in the [Coi
103116
<b>{`]`}</b>
104117
</pre>
105118

106-
In this case, **the record selector can be omitted** and the whole response becomes the list of records.
119+
In this case, **the Field Path can be omitted** and the whole response becomes the list of records.
107120

108-
### Single object
121+
#### Single object
109122

110-
Sometimes, there is only one record returned per request from the API. In this case, the record selector can also point to an object instead of an array which will be handled as the only record, like in the case of the [Exchange Rates API](https://exchangeratesapi.io/documentation/#historicalrates):
123+
Sometimes, there is only one record returned per request from the API. In this case, the field path can also point to an object instead of an array which will be handled as the only record, like in the case of the [Exchange Rates API](https://exchangeratesapi.io/documentation/#historicalrates):
111124

112125
<pre>
113126
{`{
@@ -128,11 +141,11 @@ Sometimes, there is only one record returned per request from the API. In this c
128141
}`}
129142
</pre>
130143

131-
In this case, a record selector of `rates` will yield a single record which contains all the exchange rates in a single object.
144+
In this case, **setting the Field Path to `rates`** will yield a single record which contains all the exchange rates in a single object.
132145

133-
### Fields nested in arrays
146+
#### Fields nested in arrays
134147

135-
In some cases, records are selected in multiple branches of the response object (for example within each item of an array):
148+
In some cases, records are located in multiple branches of the response object (for example within each item of an array):
136149

137150
```
138151
@@ -153,7 +166,7 @@ In some cases, records are selected in multiple branches of the response object
153166
154167
```
155168

156-
In this case a record selector with a placeholder `*` selects all children at the current position in the path, in this case **`data`, `*`, `record`** will return the following records:
169+
A Field Path with a placeholder `*` selects all children at the current position in the path, so in this case **setting Field Path to `data`,`*`,`record`** will return the following records:
157170

158171
```
159172
[
@@ -166,6 +179,87 @@ In this case a record selector with a placeholder `*` selects all children at th
166179
]
167180
```
168181

182+
### Record Filter
183+
In some cases, certain certain records should be excluded from the final output of the connector, which can be accomplished through the Record Filter feature within the Record Selector component.
184+
185+
For example, say your API response looks like this:
186+
```
187+
[
188+
{
189+
"id": 1,
190+
"status": "pending"
191+
},
192+
{
193+
"id": 2,
194+
"status": "active"
195+
},
196+
{
197+
"id": 3,
198+
"status": "expired"
199+
}
200+
]
201+
```
202+
and you only want to sync records for which the status is not `expired`.
203+
204+
You can accomplish this by setting the Record Filter to `{{ record.status != 'expired' }}`
205+
206+
Any records for which this expression evaluates to `true` will be emitted by the connector, and any for which it evaluates to `false` will be excluded from the output.
207+
208+
Note that Record Filter value must be an [interpolated string](/connector-development/config-based/advanced-topics#string-interpolation) with the filtering condition placed inside double curly braces `{{ }}`.
209+
210+
### Cast Record Fields to Schema Types
211+
Sometimes the type of a field in the record is not the desired type. If the existing field type can be simply cast to the desired type, this can be solved by setting the stream's declared schema to the desired type and enabling `Cast Record Fields to Schema Types`.
212+
213+
For example, say the API response looks like this:
214+
```
215+
[
216+
{
217+
"street": "Kulas Light",
218+
"city": "Gwenborough",
219+
"geo": {
220+
"lat": "-37.3159",
221+
"lng": "81.1496"
222+
}
223+
},
224+
{
225+
"street": "Victor Plains",
226+
"city": "Wisokyburgh",
227+
"geo": {
228+
"lat": "-43.9509",
229+
"lng": "-34.4618"
230+
}
231+
}
232+
]
233+
```
234+
Notice that the `lat` and `lng` values are strings despite them all being numeric. If you would rather have these fields contain raw number values in your output records, you can do the following:
235+
- In the Declared Schema tab, disable `Automatically import detected schema`
236+
- Change the `type` of the `lat` and `lng` fields from `string` to `number`
237+
- Enable `Cast Record Fields to Schema Types` in the Record Selector component
238+
239+
This will cause those fields in the output records to be cast to the type declared in the schema, so the output records will now look like this:
240+
```
241+
[
242+
{
243+
"street": "Kulas Light",
244+
"city": "Gwenborough",
245+
"geo": {
246+
"lat": -37.3159,
247+
"lng": 81.1496
248+
}
249+
},
250+
{
251+
"street": "Victor Plains",
252+
"city": "Wisokyburgh",
253+
"geo": {
254+
"lat": -43.9509,
255+
"lng": -34.4618
256+
}
257+
}
258+
]
259+
```
260+
Note that this casting is performed on a best-effort basis; if you tried to set the `city` field's type to `number` in the schema, for example, it would remain unchanged because those string values cannot be cast to numbers.
261+
262+
169263
## Transformations
170264

171265
It is recommended to not change records during the extraction process the connector is performing, but instead load them into the downstream warehouse unchanged and perform necessary transformations there in order to stay flexible in what data is required. However there are some reasons that require the modifying the fields of records before they are sent to the warehouse:
@@ -230,7 +324,7 @@ Setting the "Path" of the remove-transformation to `content` removes these field
230324
}
231325
```
232326

233-
Like in case of the record selector, properties of deeply nested objects can be removed as well by specifying the path of properties to the target field that should be removed.
327+
Like in case of the record selector's Field Path, properties of deeply nested objects can be removed as well by specifying the path of properties to the target field that should be removed.
234328

235329
### Removing fields that match a glob pattern
236330

0 commit comments

Comments
 (0)