Skip to content

Add a US Census connector #3662

Closed
Closed
@dmateusp

Description

@dmateusp

Tell us about the problem you're trying to solve

I am trying to ingest public census data as part of a POC: https://api.census.gov/data/timeseries/eits/mrts/examples.html

But their responses are non standard: https://www.census.gov/data/developers/guidance/api-user-guide.Core_Concepts.html

image (3)

e.g.

[
  ["column_a", "column_b", "column_c"],
  ["value_a1", "value_b1", "value_c1"],
  ["value_a2", "value_b2", "value_c2"]
]

Describe the solution you’d like

I would like Airbyte to be able to ingest this data, currently we try to use the HTTP connector and we get:

E   pydantic.error_wrappers.ValidationError: 1 validation error for AirbyteRecordMessage
E   data
E     value is not a valid dict (type=type_error.dict)

I see 3 potential solutions (thanks marcosmarxm for the idea to build something census specific):

  1. Create a US Census specific connector, which supports their non-conventional responses
  2. Patch the HTTP connector: Airbyte recognizes "list of list" as a special response format, assumes the first row is the column names, and returns something like:
{"data": [
  {"column_a": "value_a1", "column_b": "value_b1", "column_c": "value_c1"},
  {"column_a": "value_a2", "column_b": "value_b2", "column_c": "value_c2"},
]}
  1. OR patch the http connector: do not recognize it as a special format, and return
[
  {"data": ["column_a", "column_b", "column_c"]},
  {"data": ["value_a1", "value_b1", "value_c1"]},
  {"data": ["value_a2", "value_b2", "value_c2"]}
]

Are you willing to submit a PR?

Yes, I opened a PR that patches the HTTP connector, but I would like to attempt creating a census specific connector instead.

┆Issue is synchronized with this Asana task by Unito

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions