Skip to content

Source Mixpanel: "export" stream make line parsing more robust #18846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Nov 3, 2022

Conversation

grubberr
Copy link
Contributor

@grubberr grubberr commented Nov 2, 2022

Signed-off-by: Sergey Chvalyuk [email protected]

What

Try to fix https://github.com/airbytehq/oncall/issues/945

For "export" stream make line parsing more robust:
The incoming stream of records has to be JSON lines format.
From time to time for some reason, the one record can be split into multiple lines.
We try to combine such split parts into one record only if parts go nearby.

How

Describe the solution

Recommended reading order

  1. x.java
  2. y.python

🚨 User Impact 🚨

Are there any breaking changes? What is the end result perceived by the user? If yes, please merge this PR with the 🚨🚨 emoji so changelog authors can further highlight this if needed.

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • Grant edit access to maintainers (instructions)
  • Secrets in the connector's spec are annotated with airbyte_secret
  • Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • Code reviews completed
  • Documentation updated
    • Connector's README.md
    • Connector's bootstrap.md. See description and examples
    • Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • Create a non-forked branch based on this PR and test the below items on it
  • Build is successful
  • If new credentials are required for use in CI, add them to GSM. Instructions.
  • /test connector=connectors/<name> command is passing
  • New Connector version released on Dockerhub and connector version bumped by running the /publish command described here

Tests

Unit

Put your unit tests output here.

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

Signed-off-by: Sergey Chvalyuk <[email protected]>
@grubberr
Copy link
Contributor Author

grubberr commented Nov 2, 2022

/test connector=connectors/source-mixpanel

🕑 connectors/source-mixpanel https://github.com/airbytehq/airbyte/actions/runs/3376724250
❌ connectors/source-mixpanel https://github.com/airbytehq/airbyte/actions/runs/3376724250
🐛 https://gradle.com/s/ot2bcu5t4xsay

Build Failed

Test summary info:

=========================== short test summary info ============================
FAILED test_incremental.py::TestIncremental::test_read_sequential_slices[inputs0]
============ 1 failed, 30 passed, 29 warnings in 6963.01s (1:56:03) ============

Signed-off-by: Sergey Chvalyuk <[email protected]>
@github-actions github-actions bot added the area/documentation Improvements or additions to documentation label Nov 2, 2022
@grubberr
Copy link
Contributor Author

grubberr commented Nov 2, 2022

/test connector=connectors/source-mixpanel

🕑 connectors/source-mixpanel https://github.com/airbytehq/airbyte/actions/runs/3379870009
✅ connectors/source-mixpanel https://github.com/airbytehq/airbyte/actions/runs/3379870009
Python tests coverage:

Name                                         Stmts   Miss  Cover
----------------------------------------------------------------
source_mixpanel/utils.py                         8      0   100%
source_mixpanel/streams/revenue.py              14      0   100%
source_mixpanel/streams/funnels.py              57      0   100%
source_mixpanel/streams/cohorts.py              15      0   100%
source_mixpanel/streams/__init__.py              9      0   100%
source_mixpanel/property_transformation.py      19      0   100%
source_mixpanel/__init__.py                      2      0   100%
source_mixpanel/streams/base.py                 89      2    98%
source_mixpanel/streams/export.py               68      3    96%
source_mixpanel/streams/engage.py               88      6    93%
source_mixpanel/source.py                       79     10    87%
source_mixpanel/streams/annotations.py          16      3    81%
source_mixpanel/streams/cohort_members.py       21      7    67%
source_mixpanel/testing.py                      29     11    62%
----------------------------------------------------------------
TOTAL                                          514     42    92%
	 Name                                                 Stmts   Miss  Cover   Missing
	 ----------------------------------------------------------------------------------
	 source_acceptance_test/base.py                          12      4    67%   16-19
	 source_acceptance_test/config.py                       133      3    98%   87, 93, 230
	 source_acceptance_test/conftest.py                     196     97    51%   35, 41-43, 48, 54, 60, 66, 72-74, 80-95, 100, 105-107, 113-115, 121-122, 127-128, 133, 139, 148-157, 163-168, 232, 238, 244-250, 258-263, 271-284, 289-295, 302-313, 320-336
	 source_acceptance_test/plugin.py                        69     25    64%   22-23, 31, 36, 120-140, 144-148
	 source_acceptance_test/tests/test_core.py              329    106    68%   39, 50-58, 63-70, 74-75, 79-80, 164, 202-219, 228-236, 240-245, 251, 284-289, 327-334, 377-379, 382, 447-455, 484-485, 491, 494, 530-540, 553-578
	 source_acceptance_test/tests/test_incremental.py       145     20    86%   21-23, 29-31, 36-43, 48-61, 224
	 source_acceptance_test/utils/asserts.py                 37      2    95%   57-58
	 source_acceptance_test/utils/common.py                  77     10    87%   15-16, 24-30, 64, 67
	 source_acceptance_test/utils/compare.py                 62     23    63%   21-51, 68, 97-99
	 source_acceptance_test/utils/config_migration.py        23     23     0%   5-37
	 source_acceptance_test/utils/connector_runner.py       112     50    55%   23-26, 32, 36, 39-68, 71-73, 76-78, 81-83, 86-88, 91-93, 96-114, 148-150
	 source_acceptance_test/utils/json_schema_helper.py     105     13    88%   30-31, 38, 41, 65-68, 96, 120, 190-192
	 ----------------------------------------------------------------------------------
	 TOTAL                                                 1479    376    75%

Build Passed

Test summary info:

All Passed

Signed-off-by: Sergey Chvalyuk <[email protected]>
@grubberr grubberr changed the title Source Mixpanel: export stream improve line parsing Source Mixpanel: "export" stream make line parsing more robust Nov 2, 2022
@grubberr grubberr self-assigned this Nov 2, 2022
# combine record from 2 standing nearby parts
assert list(stream.iter_dicts([record_string, record_string[:2], record_string[2:], record_string])) == [record, record, record]
# drop record parts because they are not standing nearby
assert list(stream.iter_dicts([record_string, record_string[:2], record_string, record_string[2:]])) == [record, record]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you do it using pytest.mark.parametrize so that the parameters would be more human readable?

@grubberr
Copy link
Contributor Author

grubberr commented Nov 3, 2022

/publish connector=connectors/source-mixpanel

🕑 Publishing the following connectors:
connectors/source-mixpanel
https://github.com/airbytehq/airbyte/actions/runs/3384941491


Connector Did it publish? Were definitions generated?
connectors/source-mixpanel

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@grubberr
Copy link
Contributor Author

grubberr commented Nov 3, 2022

/publish connector=connectors/source-mixpanel

🕑 Publishing the following connectors:
connectors/source-mixpanel
https://github.com/airbytehq/airbyte/actions/runs/3385861170


Connector Did it publish? Were definitions generated?
connectors/source-mixpanel

if you have connectors that successfully published but failed definition generation, follow step 4 here ▶️

@octavia-squidington-iii octavia-squidington-iii temporarily deployed to more-secrets November 3, 2022 14:38 Inactive
@grubberr grubberr merged commit c01b81b into master Nov 3, 2022
@grubberr grubberr deleted the grubberr/oncall-945-source-mixpanel branch November 3, 2022 17:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connectors Connector related issues area/documentation Improvements or additions to documentation connectors/source/mixpanel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants