Closed
Description
Tell us about the problem you're trying to solve
I am trying to ingest public census data as part of a POC: https://api.census.gov/data/timeseries/eits/mrts/examples.html
But their responses are non standard: https://www.census.gov/data/developers/guidance/api-user-guide.Core_Concepts.html
e.g.
[
["column_a", "column_b", "column_c"],
["value_a1", "value_b1", "value_c1"],
["value_a2", "value_b2", "value_c2"]
]
Describe the solution you’d like
I would like Airbyte to be able to ingest this data, currently we try to use the HTTP connector and we get:
E pydantic.error_wrappers.ValidationError: 1 validation error for AirbyteRecordMessage
E data
E value is not a valid dict (type=type_error.dict)
I see 3 potential solutions (thanks marcosmarxm for the idea to build something census specific):
- Create a US Census specific connector, which supports their non-conventional responses
- Patch the HTTP connector: Airbyte recognizes "list of list" as a special response format, assumes the first row is the column names, and returns something like:
{"data": [
{"column_a": "value_a1", "column_b": "value_b1", "column_c": "value_c1"},
{"column_a": "value_a2", "column_b": "value_b2", "column_c": "value_c2"},
]}
- OR patch the http connector: do not recognize it as a special format, and return
[
{"data": ["column_a", "column_b", "column_c"]},
{"data": ["value_a1", "value_b1", "value_c1"]},
{"data": ["value_a2", "value_b2", "value_c2"]}
]
Are you willing to submit a PR?
Yes, I opened a PR that patches the HTTP connector, but I would like to attempt creating a census specific connector instead.
┆Issue is synchronized with this Asana task by Unito