Skip to content

feat(source): support json schema #11797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Aug 29, 2023
Merged

feat(source): support json schema #11797

merged 29 commits into from
Aug 29, 2023

Conversation

wugouzi
Copy link
Contributor

@wugouzi wugouzi commented Aug 21, 2023

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

Support json schema as schema registry. Syntax is

create table t with (
  connector = 'kafka',
  topic = 'json_schema',
  properties.bootstrap.server = 'message_queue:29092',
  scan.startup.mode = 'earliest'
) FORMAT PLAIN ENCODE JSON (
  schema.registry = 'url',
  [optional] schema.registry.username = 'username',
  [optional] schema.registry.password = 'password',
)

Basic idea is to use jsonschema-transpiler to generate avro schema from json schema.

Redpanda does not support json schema yet (see redpanda-data/redpanda#1878). So kafka is used here as workaround.

References outside the schema is not supported yet.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

If this PR includes changes that directly affect users or other significant modifications relevant to the community, kindly draft a release note to provide a concise summary of these changes. Please prioritize highlighting the impact these changes will have on users.

@wugouzi wugouzi added the user-facing-changes Contains changes that are visible to users label Aug 22, 2023
@wugouzi wugouzi requested a review from tabVersion August 22, 2023 10:25
@codecov
Copy link

codecov bot commented Aug 22, 2023

Codecov Report

Merging #11797 (2f18c5c) into main (181953d) will decrease coverage by 0.02%.
Report is 24 commits behind head on main.
The diff coverage is 50.20%.

@@            Coverage Diff             @@
##             main   #11797      +/-   ##
==========================================
- Coverage   70.21%   70.20%   -0.02%     
==========================================
  Files        1373     1373              
  Lines      228741   228886     +145     
==========================================
+ Hits       160621   160688      +67     
- Misses      68120    68198      +78     
Flag Coverage Δ
rust 70.20% <50.20%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
src/connector/src/parser/avro/parser.rs 69.66% <ø> (ø)
src/connector/src/parser/avro/schema_resolver.rs 0.00% <0.00%> (-4.23%) ⬇️
...c/connector/src/parser/debezium/debezium_parser.rs 50.00% <0.00%> (-1.25%) ⬇️
src/frontend/src/handler/create_table.rs 89.60% <ø> (+0.37%) ⬆️
src/connector/src/parser/util.rs 3.22% <7.14%> (+3.22%) ⬆️
src/connector/src/parser/mod.rs 46.25% <45.45%> (-0.07%) ⬇️
src/frontend/src/handler/create_source.rs 47.91% <51.42%> (+0.15%) ⬆️
src/connector/src/parser/json_parser.rs 83.98% <59.49%> (-5.39%) ⬇️
src/connector/src/parser/canal/simd_json_parser.rs 84.74% <100.00%> (+1.41%) ⬆️
.../connector/src/parser/debezium/simd_json_parser.rs 95.10% <100.00%> (+0.01%) ⬆️
... and 1 more

... and 5 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@StrikeW
Copy link
Contributor

StrikeW commented Aug 23, 2023

Is schema registry is required for JSON format after this pr? IIRC, we have a conclusion before that schema registry is required for Avro format. cc @neverchanje

@wugouzi
Copy link
Contributor Author

wugouzi commented Aug 23, 2023

Is schema registry is required for JSON format after this pr? IIRC, we have a conclusion before that schema registry is required for Avro format. cc @neverchanje

It's an option. We still support JSON format in old way.

@tabVersion
Copy link
Contributor

Is schema registry is required for JSON format after this pr?

It is not required. We implement it at a poc user's request.

planner_error: 'Invalid input syntax: schema definition is required for ENCODE JSON'
planner_error: 'Protocol error: missing field ''connector'''
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@st1page please double check the change.

@tabVersion tabVersion requested review from shanicky and st1page August 24, 2023 10:13
@tabVersion
Copy link
Contributor

References outside the schema is not supported yet.

I don't see how you handle this situation.

@wugouzi
Copy link
Contributor Author

wugouzi commented Aug 25, 2023

References outside the schema is not supported yet.

I don't see how you handle this situation.

jsonschema-transpiler does not support this based on my testing.

@wugouzi
Copy link
Contributor Author

wugouzi commented Aug 28, 2023

Performance

json_parser             time:   [1.7250 s 1.7257 s 1.7266 s]
                        change: [-0.3861% -0.3042% -0.2271%] (p = 0.00 < 0.05)
                        Change within noise threshold.

Copy link
Contributor

@tabVersion tabVersion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

basically LGTM, waiting @st1page for syntax check.

@wugouzi wugouzi enabled auto-merge August 29, 2023 05:15
@wugouzi wugouzi added this pull request to the merge queue Aug 29, 2023
Merged via the queue into main with commit 61ab2cd Aug 29, 2023
@wugouzi wugouzi deleted the qiao/json-schema branch August 29, 2023 05:26
@BugenZhao
Copy link
Member

This PR introduces too many outdated and unnecessary dependencies to our workspace, including env_logger and clap 2. We should consider forking the dependency of the executable jsonschema-transpiler and making it a modern library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/feature Type: New feature. user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants