Skip to content

Make CI pass #3671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 10, 2025
Merged

Make CI pass #3671

merged 9 commits into from
Apr 10, 2025

Conversation

se7entyse7en
Copy link
Member

@se7entyse7en se7entyse7en commented Mar 6, 2025

An update in the yarl package broke aiohttp

aiohttp relies on the yarl library for URL parsing and building. When running the tests, the majority of them fail due to a version of yarl that is incompatible with aiohttp.

The yarl requirement in aiohttp==3.8.4 (current version we have in our requirements) is very loose: https://github.com/aio-libs/aiohttp/blob/v3.8.4/setup.cfg#L56

Indeed when inspecting the Docker container we can check using pip freeze that the version of yarl installed is 1.18.3. This is the output of pipdeptree:

root@f7718d7499e7:/server# pipdeptree --reverse --packages yarl
yarl==1.18.3
└── aiohttp==3.8.4 [requires: yarl>=1.0,<2.0]
    ├── aiohttp-cors==0.7.0 [requires: aiohttp>=1.1]
    ├── aiohttp-jinja2==1.5.1 [requires: aiohttp>=3.6.3]
    ├── gcloud-aio-auth==4.2.1 [requires: aiohttp>=3.3.0,<4.0.0]
    │   ├── gcloud-aio-kms==4.2.0 [requires: gcloud-aio-auth>=3.1.0,<5.0.0]
    │   └── gcloud-aio-pubsub==5.4.0 [requires: gcloud-aio-auth>=3.3.0,<5.0.0]
    └── pytest-aiohttp==1.0.4 [requires: aiohttp>=3.8.1]

But after version 1.13 yarl changed how URLs are parsed in a way that port number cannot be part anymore of the host param when using URL.build(...host=host), but needs to be used as URL.build(...authority=host): aio-libs/aiohttp#9307

This has been fixed in this PR: aio-libs/aiohttp#9309

and has been shipped in version 3.10.7 which is the one being used in this PR.

numpy>=1.24's expired deprecations affect the way we initialize Dataframes

Currently we're using numpy==1.26.4. This is the release notes for numpy>=1.24: https://numpy.org/doc/2.2/release/1.24.0-notes.html#expired-deprecations

In particular:

Ragged array creation will now always raise a ValueError unless dtype=object is passed. This includes very deeply nested sequences.

This is the PR: numpy/numpy#22004

We'd need to update how we use md.DataFrame everywhere and include dtype=object or make medvedi handle it. For now, to minimize the changes I'm downgrading to the latest 1.23 version.

A test is still using pandas instead of medvedi

I don't know exactly the chronology of how this happens, but we have a test (tests/internal/miners/github/test_check_run_benchmark.py) that still relies on pandas.

With the same philosophy of trying to just make things work as they used to, I just added pandas as a test requirement and convert the pd.DataFrame to a md.DataFrame to reuse the downstream functions that have been already ported to be used with medvedi.


This change is Reviewable

@se7entyse7en se7entyse7en requested a review from a team as a code owner March 6, 2025 15:55
@vmarkovtsev
Copy link
Contributor

vmarkovtsev commented Mar 6, 2025

Let me take care of the stuff related to md.DataFrame. Is there an error example where it wants dtype=object?

Upd.: I guess, this is an example

/home/runner/.local/lib/python3.11/site-packages/medvedi/dataframe.py:1354: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
    r = np.array(value, dtype, copy=copy)

This is so weird, since the CI of medvedi passed on 1.26.4 🤔

@@ -1,5 +1,5 @@
[build-system]
requires = ["setuptools==75.3.0", "wheel", "Cython==3.0.12", "numpy==1.26.4"]
requires = ["setuptools==75.3.0", "wheel", "Cython==3.0.12", "numpy==1.23.5"]
Copy link
Contributor

@vmarkovtsev vmarkovtsev Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd try hard to avoid this, since the latest wheels of medvedi build with 1.26.4, so we will lose the ablity to upgrade to Python 3.12.

@se7entyse7en
Copy link
Member Author

Let me take care of the stuff related to md.DataFrame. Is there an error example where it wants dtype=object?

Upd.: I guess, this is an example

/home/runner/.local/lib/python3.11/site-packages/medvedi/dataframe.py:1354: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
    r = np.array(value, dtype, copy=copy)

This is so weird, since the CI of medvedi passed on 1.26.4 🤔

Hey there! That would be much appreciated!

This is an example: https://github.com/athenianco/athenian-api/blob/master/server/tests/internal/features/github/test_deployment_metrics.py#L51

The tests were failing on initializing this.

I replicated it on Ipython:

Python 3.11.11 (main, Mar  1 2025, 10:20:33) [GCC 11.4.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 9.0.1 -- An enhanced Interactive Python. Type '?' for help.
Tip: Use `--theme`, or the `%colors` magic to change ipython themes and colors.

In [1]: import medvedi as md

In [2]: import numpy as np

In [3]: np.__version__
Out[3]: '1.26.4'

In [4]: !pip freeze | grep medvedi
/usr/local/lib/python3.11/dist-packages/IPython/core/interactiveshell.py:2637: UserWarning: You executed the system command !pip which may not work as expected. Try the IPython magic %pip instead.
  warnings.warn(
medvedi==0.1.68

In [5]: from athenian.api.internal.miners.types import DeploymentFacts

In [6]: md.DataFrame(
   ...:         {
   ...:             DeploymentFacts.f.pr_authors: [[1, 2, 3], [1, 4, 5], [2, 4, 6], [], [3]],
   ...:             DeploymentFacts.f.commit_authors: [[1, 2, 3], [1, 4, 5, 6], [2, 4, 6], [7], [3]],
   ...:             DeploymentFacts.f.release_authors: [[], [], [1, 2], [], [7]],
   ...:             DeploymentFacts.f.commits_overall: [[1, 1], [1, 1], [0, 1], [1], [1]],
   ...:             "environment": ["1", "2", "1", "3", "3"],
   ...:             DeploymentFacts.f.repositories: [["1", "2"], ["1", "3"], ["2", "3"], ["1"], ["3"]],
   ...:         },
   ...:     )
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[6], line 1
----> 1 md.DataFrame(
      2         {
      3             DeploymentFacts.f.pr_authors: [[1, 2, 3], [1, 4, 5], [2, 4, 6], [], [3]],
      4             DeploymentFacts.f.commit_authors: [[1, 2, 3], [1, 4, 5, 6], [2, 4, 6], [7], [3]],
      5             DeploymentFacts.f.release_authors: [[], [], [1, 2], [], [7]],
      6             DeploymentFacts.f.commits_overall: [[1, 1], [1, 1], [0, 1], [1], [1]],
      7             "environment": ["1", "2", "1", "3", "3"],
      8             DeploymentFacts.f.repositories: [["1", "2"], ["1", "3"], ["2", "3"], ["1"], ["3"]],
      9         },
     10     )

File /usr/local/lib/python3.11/dist-packages/medvedi/dataframe.py:340, in DataFrame.__init__(self, data, columns, index, copy, dtype, check)
    338     if columns is not None:
    339         raise ValueError("Either `data` or `columns` must be specified")
--> 340     built_columns = {k: self._asarray(v, dtype.get(k), copy) for k, v in data.items()}
    341 else:
    342     if columns is None:

File /usr/local/lib/python3.11/dist-packages/medvedi/dataframe.py:340, in <dictcomp>(.0)
    338     if columns is not None:
    339         raise ValueError("Either `data` or `columns` must be specified")
--> 340     built_columns = {k: self._asarray(v, dtype.get(k), copy) for k, v in data.items()}
    341 else:
    342     if columns is None:

File /usr/local/lib/python3.11/dist-packages/medvedi/dataframe.py:1354, in DataFrame._asarray(value, dtype, copy)
   1352     r[:] = value
   1353 else:
-> 1354     r = np.array(value, dtype, copy=copy)
   1355 return r

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (5,) + inhomogeneous part.

If you strongly think that it's better to not downgrade numpy it would be great if you can check it out, for the test still relying on pandas since it's just a test dependency I think it's not that much of a big deal.

Both me and @lwsanty are trying to do the minimum necessary to keep things afloat. 😅

If you change medvedi to make it transparent to the API that would be cool!

@se7entyse7en se7entyse7en force-pushed the ci-tests-fixes branch 2 times, most recently from 7e4c93f to 41d62a2 Compare March 21, 2025 23:51
@se7entyse7en
Copy link
Member Author

@vmarkovtsev all tests seem fine now! 🥳

The "Custom check" I think/hope that will be fixed once "packaging" is upgraded as well together with pip and setuptools, but since it's a change in the yaml definition cannot be tested in this PR. I tested it on an ubuntu server and worked.

We're also starting to really need this as a customer is experiencing some issues that we'd need to investigate. Is the concern around numpy only about not being able to upgrade to python3.12? or it may cause issues with medvedi that are not caught by the tests?

@se7entyse7en se7entyse7en force-pushed the ci-tests-fixes branch 3 times, most recently from 5493f72 to 9f3980e Compare March 27, 2025 23:05
@se7entyse7en
Copy link
Member Author

Merging this PR as we deployed it on prod on March 28 and everything worked fine. 🥳

@se7entyse7en se7entyse7en changed the title [WIP] Make CI pass Make CI pass Apr 10, 2025
@se7entyse7en se7entyse7en merged commit 620ec6d into master Apr 10, 2025
17 of 18 checks passed
@se7entyse7en se7entyse7en deleted the ci-tests-fixes branch April 10, 2025 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants