Skip to content

feat(qa_check): enable checking connector docs structure via qa check #39326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 5 commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
e33561e
enable CheckDocumentationStructure qa check + extended test suite
darynaishchenko May 31, 2024
8b5993b
added keywords for creds check, valid http statuses, spec for low-code
darynaishchenko May 31, 2024
ae8e1f8
added specific heading for cloud/oss setup
darynaishchenko May 31, 2024
8d082e5
updated _replace_link func
darynaishchenko May 31, 2024
c5648db
updated headers and description templates. added unit tests.
darynaishchenko Jun 6, 2024
c6d52f8
added CheckDocumentationLinks
darynaishchenko Jun 10, 2024
ff7ae9c
refactored documentation qa checks
darynaishchenko Jun 10, 2024
2f10134
refactored changelog checking
darynaishchenko Jun 11, 2024
58f5c34
updated unit tests
darynaishchenko Jun 11, 2024
3aa5061
Merge branch 'master' into daryna/move-TestConnectorDocumentation-fro…
darynaishchenko Jun 11, 2024
e96ebea
fix tests
darynaishchenko Jun 11, 2024
ce250d7
bump version
darynaishchenko Jun 11, 2024
d39e7f6
refactor type hints
darynaishchenko Jun 11, 2024
f118b61
updated connector_spec_file_content comment
darynaishchenko Jun 21, 2024
122cb39
deleted separete doc templates
darynaishchenko Jun 21, 2024
c3bf40b
deleted documentation utils from common utils
darynaishchenko Jun 21, 2024
3bba5be
added documentation models and helpers
darynaishchenko Jun 21, 2024
3769314
refactor documentation checks
darynaishchenko Jun 21, 2024
9685475
added one standard template
darynaishchenko Jun 21, 2024
8966149
deleted old documentation file
darynaishchenko Jun 21, 2024
92844a2
added templates for checks descriptions
darynaishchenko Jun 21, 2024
0a0609d
generated qa-checks doc
darynaishchenko Jun 21, 2024
075de5c
updated init.py
darynaishchenko Jun 21, 2024
b5bc9bf
updated unit tests
darynaishchenko Jun 21, 2024
27a6f27
fix bugs in qa checks
darynaishchenko Jun 25, 2024
ca0bd36
format fix
darynaishchenko Jun 25, 2024
77b61a6
renamed templates
darynaishchenko Jul 3, 2024
bbfb429
refactor code
darynaishchenko Jul 4, 2024
8d549db
Merge branch 'master' into daryna/move-TestConnectorDocumentation-fro…
darynaishchenko Jul 4, 2024
c3372fe
updated qa-checks.md
darynaishchenko Jul 4, 2024
ec08f63
Merge branch 'master' into daryna/move-TestConnectorDocumentation-fro…
darynaishchenko Jul 8, 2024
cf571cc
Merge branch 'master' into daryna/move-TestConnectorDocumentation-fro…
darynaishchenko Aug 12, 2024
9316069
bump versions
darynaishchenko Aug 12, 2024
05e48ea
fixed docs
darynaishchenko Aug 12, 2024
00f4289
updated link to template
darynaishchenko Aug 12, 2024
4e1ee54
removed invalid link
darynaishchenko Aug 12, 2024
0fd84a4
Merge branch 'master' into daryna/move-TestConnectorDocumentation-fro…
darynaishchenko Aug 12, 2024
7a2a9d3
updated CheckDocumentationLinks to skip example urls and 406 status code
darynaishchenko Aug 12, 2024
3ea9b9a
Merge branch 'master' into daryna/move-TestConnectorDocumentation-fro…
darynaishchenko Aug 13, 2024
86bce76
Merge branch 'master' into daryna/move-TestConnectorDocumentation-fro…
darynaishchenko Aug 14, 2024
4611ef2
fixed documentation
darynaishchenko Aug 14, 2024
8bd8d65
updated links validation
darynaishchenko Aug 14, 2024
5e1d905
fixed tests
darynaishchenko Aug 14, 2024
c9c609d
Merge branch 'master' into daryna/move-TestConnectorDocumentation-fro…
darynaishchenko Aug 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions airbyte-ci/connectors/connector_ops/connector_ops/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
#

import functools
import json
import logging
import os
import re
Expand Down Expand Up @@ -381,6 +382,21 @@ def metadata(self) -> Optional[dict]:
return None
return yaml.safe_load((self.code_directory / METADATA_FILE_NAME).read_text())["data"]

@property
def connector_spec(self) -> Optional[dict]:
yaml_spec = Path(self.python_source_dir_path / "spec.yaml")
json_spec = Path(self.python_source_dir_path / "spec.json")

if yaml_spec.exists():
return yaml.safe_load(yaml_spec.read_text())
elif json_spec.exists():
with open(json_spec) as f:
return json.load(f)
elif self.manifest_path.exists():
return yaml.safe_load(self.manifest_path.read_text())["spec"]

return None

@property
def language(self) -> ConnectorLanguage:
if Path(self.code_directory / self.technical_name.replace("-", "_") / "manifest.yaml").is_file():
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@

1. [Log into your Airbyte Cloud](https://cloud.airbyte.com/workspaces) account.
2. Click Sources and then click + New source.
3. On the Set up the source page, select {connector_name} from the Source type dropdown.
4. Enter a name for the {connector_name} connector.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@

1. Navigate to the Airbyte Open Source dashboard.
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@

<HideInUI>

This page contains the setup guide and reference information for the [{connector_name}]({docs_link}) source connector.

</HideInUI>
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@

The {connector_name} source connector supports the following [sync modes](https://docs.airbyte.com/cloud/core-concepts/#connection-sync-modes):
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@

Now that you have set up the {connector_name} source connector, check out the following {connector_name} tutorials:
Original file line number Diff line number Diff line change
@@ -1,11 +1,26 @@
# Copyright (c) 2023 Airbyte, Inc., all rights reserved.

import re
import textwrap
from pathlib import Path
from threading import Thread
from typing import List

from connector_ops.utils import Connector # type: ignore
import requests
from connector_ops.utils import Connector, ConnectorLanguage # type: ignore
from connectors_qa import consts
from connectors_qa.models import Check, CheckCategory, CheckResult
from connectors_qa.utils import (
description_end_line_index,
documentation_node,
header_name,
prepare_headers,
prepare_lines_to_compare,
reason_missing_titles,
reason_titles_not_match,
remove_not_required_step_headers,
remove_step_from_heading,
required_titles_from_spec,
)
from pydash.objects import get # type: ignore


Expand Down Expand Up @@ -113,6 +128,62 @@ class CheckDocumentationStructure(DocumentationCheck):
"## Changelog",
]

PREREQUISITES = "Prerequisites"
HEADING = "heading"
CREDENTIALS_KEYWORDS = ["account", "auth", "credentials", "access", "client"]
CONNECTOR_SPECIFIC_HEADINGS = "<Connector-specific features>"

def _get_template_headings(self, connector_name: str) -> tuple[tuple[str], tuple[str]]:
"""
Headings in order to docs structure.
"""
all_headings = (
connector_name,
"Prerequisites",
"Setup guide",
f"Set up {connector_name}",
"For Airbyte Cloud:",
"For Airbyte Open Source:",
self.CONNECTOR_SPECIFIC_HEADINGS,
f"Set up the {connector_name} connector in Airbyte",
"For Airbyte Cloud:",
"For Airbyte Open Source:",
self.CONNECTOR_SPECIFIC_HEADINGS,
"Supported sync modes",
"Supported Streams",
self.CONNECTOR_SPECIFIC_HEADINGS,
"Performance considerations",
"Data type map",
"Limitations & Troubleshooting",
self.CONNECTOR_SPECIFIC_HEADINGS,
"Tutorials",
"Changelog",
)
not_required_heading = (
f"Set up the {connector_name} connector in Airbyte",
"For Airbyte Cloud:",
"For Airbyte Open Source:",
self.CONNECTOR_SPECIFIC_HEADINGS,
"Performance considerations",
"Data type map",
"Limitations & Troubleshooting",
"Tutorials",
)
return all_headings, not_required_heading

def _headings_description(self, connector_name: str) -> dict[str:Path]:
"""
Headings with path to file with template description
"""
descriptions_paths = {
connector_name: Path(__file__).parent / "doc_templates/source.txt",
"For Airbyte Cloud:": Path(__file__).parent / "doc_templates/for_airbyte_cloud.txt",
"For Airbyte Open Source:": Path(__file__).parent / "doc_templates/for_airbyte_open_source.txt",
"Supported sync modes": Path(__file__).parent / "doc_templates/supported_sync_modes.txt",
"Tutorials": Path(__file__).parent / "doc_templates/tutorials.txt",
}
return descriptions_paths

def check_main_header(self, connector: Connector, doc_lines: List[str]) -> List[str]:
errors = []
if not doc_lines[0].lower().startswith(f"# {connector.metadata['name']}".lower()):
Expand All @@ -121,40 +192,194 @@ def check_main_header(self, connector: Connector, doc_lines: List[str]) -> List[
)
return errors

def check_sections(self, doc_lines: List[str]) -> List[str]:
def validate_links(self, docs_content) -> List[str]:
valid_status_codes = [200, 403, 401, 405, 429, 503] # we skip 4xx due to needed access
links = re.findall("(https?://[^\s\`)]+)", docs_content)
invalid_links = []
threads = []

def request_link(docs_link):
try:
response = requests.get(docs_link)
if response.status_code not in valid_status_codes:
invalid_links.append(f"{docs_link} with {response.status_code} status code")
except requests.exceptions.SSLError:
pass

for link in links:
process = Thread(target=request_link, args=[link])
process.start()
threads.append(process)

for process in threads:
process.join(timeout=30) # 30s timeout for process else link will be skipped
process.is_alive()

errors = []
for expected_section in self.expected_sections:
if expected_section.lower() not in doc_lines:
errors.append(f"Connector documentation is missing a '{expected_section.replace('#', '').strip()}' section")
for link in invalid_links:
errors.append(f"Link {link} is invalid in the connector documentation.")

return errors

def _run(self, connector: Connector) -> CheckResult:
if not connector.documentation_file_path or not connector.documentation_file_path.exists():
return self.fail(
connector=connector,
message="Could not check documentation structure as the documentation file is missing.",
)
def check_docs_structure(self, docs_content: str, connector_name: str) -> List[str]:
"""
test_docs_structure gets all top-level headers from source documentation file and check that the order is correct.
The order of the headers should follow our standard template https://hackmd.io/Bz75cgATSbm7DjrAqgl4rw.
_get_template_headings returns tuple of headers as in standard template and non-required headers that might nor be in the source docs.
CONNECTOR_SPECIFIC_HEADINGS value in list of required headers that shows a place where should be a connector specific headers,
which can be skipped as out of standard template and depends on connector.
"""
errors = []

doc_lines = [line.lower() for line in connector.documentation_file_path.read_text().splitlines()]
heading_names = prepare_headers(docs_content)
template_headings, non_required_heading = self._get_template_headings(connector_name)

heading_names_len, template_headings_len = len(heading_names), len(template_headings)
heading_names_index, template_headings_index = 0, 0

while heading_names_index < heading_names_len and template_headings_index < template_headings_len:
heading_names_value = heading_names[heading_names_index]
template_headings_value = template_headings[template_headings_index]
# check that template header is specific for connector and actual header should not be validated
if template_headings_value == self.CONNECTOR_SPECIFIC_HEADINGS:
# check that actual header is not in required headers, as required headers should be on a right place and order
if heading_names_value not in template_headings:
heading_names_index += 1 # go to the next actual header as CONNECTOR_SPECIFIC_HEADINGS can be more than one
continue
else:
# if actual header is required go to the next template header to validate actual header order
template_headings_index += 1
continue
# strict check that actual header equals template header
if heading_names_value == template_headings_value:
# found expected header, go to the next header in template and actual headers
heading_names_index += 1
template_headings_index += 1
continue
# actual header != template header means that template value is not required and can be skipped
if template_headings_value in non_required_heading:
# found non-required header, go to the next template header to validate actual header
template_headings_index += 1
continue
# any check is True, indexes didn't move to the next step
errors.append(reason_titles_not_match(heading_names_value, template_headings_value, template_headings))
return errors
# indexes didn't move to the last required one, so some headers are missed
if template_headings_index != template_headings_len:
errors.append(reason_missing_titles(template_headings_index, template_headings))
return errors

if not doc_lines:
return self.fail(
connector=connector,
message="Documentation file is empty",
)
return errors

def check_prerequisites_section_has_descriptions_for_required_fields(
self, actual_connector_spec: dict, connector_documentation: str, docs_path: str
) -> List[str]:
errors = []
errors.extend(self.check_main_header(connector, doc_lines))
errors.extend(self.check_sections(doc_lines))
if not actual_connector_spec:
return errors

node = documentation_node(connector_documentation)
header_line_map = {header_name(n): n.map[1] for n in node if n.type == self.HEADING}
headings = tuple(header_line_map.keys())

if not header_line_map.get(self.PREREQUISITES):
return [f"Documentation does not have {self.PREREQUISITES} section."]

prereq_start_line = header_line_map[self.PREREQUISITES]
prereq_end_line = description_end_line_index(self.PREREQUISITES, headings, header_line_map)

with open(docs_path, "r") as docs_file:
prereq_content_lines = docs_file.readlines()[prereq_start_line:prereq_end_line]
# adding real character to avoid accidentally joining lines into a wanted title.
prereq_content = "|".join(prereq_content_lines).lower()
spec = actual_connector_spec.get("connectionSpecification") or actual_connector_spec.get("connection_specification")
required_titles, has_credentials = required_titles_from_spec(spec)

for title in required_titles:
if title not in prereq_content:
errors.append(
f"Required '{title}' field is not in {self.PREREQUISITES} section "
f"or title in spec doesn't match name in the docs."
)

if has_credentials:
# credentials has specific check for keywords as we have a lot of way how to describe this step
credentials_validation = [k in prereq_content for k in self.CREDENTIALS_KEYWORDS]
if True not in credentials_validation:
errors.append(f"Required description for 'credentials' field is not in {self.PREREQUISITES} section.")

return errors

def check_docs_descriptions(self, docs_path: str, connector_documentation: str, connector_name: str) -> List[str]:
errors = []
template_descriptions = self._headings_description(connector_name)

if errors:
return self.fail(
node = documentation_node(connector_documentation)
header_line_map = {header_name(n): n.map[1] for n in node if n.type == self.HEADING}
actual_headings = tuple(header_line_map.keys())

for heading, description in template_descriptions.items():
if heading in actual_headings:

description_start_line = header_line_map[heading]
description_end_line = description_end_line_index(heading, actual_headings, header_line_map)

with open(docs_path, "r") as docs_file, open(description, "r") as template_file:

docs_description_content = docs_file.readlines()[description_start_line:description_end_line]
template_description_content = template_file.readlines()

for d, t in zip(docs_description_content, template_description_content):
d, t = prepare_lines_to_compare(connector_name, d, t)
if d != t:
errors.append(f"Description for '{heading}' does not follow structure.\nExpected: {t} Actual: {d}")

return errors

def _run(self, connector: Connector) -> CheckResult:
connector_type, sl_level = connector.connector_type, connector.ab_internal_sl
if connector_type == "source" and sl_level >= 300 and connector.language != ConnectorLanguage.JAVA:

if not connector.documentation_file_path or not connector.documentation_file_path.exists():
return self.fail(
connector=connector,
message="Could not check documentation structure as the documentation file is missing.",
)

doc_lines = [line.lower() for line in connector.documentation_file_path.read_text().splitlines()]

if not doc_lines:
return self.fail(
connector=connector,
message="Documentation file is empty",
)

docs_content = connector.documentation_file_path.read_text().rstrip()

errors = []
errors.extend(self.check_main_header(connector, doc_lines))
errors.extend(self.validate_links(docs_content))
errors.extend(self.check_docs_structure(docs_content, connector.name_from_metadata))
errors.extend(
self.check_prerequisites_section_has_descriptions_for_required_fields(
connector.connector_spec, docs_content, connector.documentation_file_path
)
)
errors.extend(self.check_docs_descriptions(connector.documentation_file_path, docs_content, connector.name_from_metadata))

if errors:
return self.fail(
connector=connector,
message=f"Connector documentation does not follow the guidelines: {'. '.join(errors)}",
)
return self.pass_(
connector=connector,
message=f"Connector documentation does not follow the guidelines: {'. '.join(errors)}",
message="Documentation guidelines are followed",
)
return self.pass_(

return self.skip(
connector=connector,
message="Documentation guidelines are followed",
reason="Check does not apply for sources with sl < 300 or/and java sources",
)


Expand Down Expand Up @@ -201,6 +426,6 @@ def _run(self, connector: Connector) -> CheckResult:
ENABLED_CHECKS = [
CheckMigrationGuide(),
CheckDocumentationExists(),
# CheckDocumentationStructure(), # Disabled as many are failing - we either need a big push or to block everyone. See https://github.com/airbytehq/airbyte/commit/4889e6e024d64ba0e353611f8fe67497b02de190#diff-3c73c6521bf819248b3d3d8aeab7cacfa4e8011f9890da93c77da925ece7eb20L262
CheckDocumentationStructure(),
CheckChangelogEntry(),
]
Loading
Loading