Skip to content

Commit 74fb3b8

Browse files
authored
Release Preparation (#194)
* removed some statistics, added more tests, doc update * bugfix * small ci fix * changed sec policy * added more tests for re * lint * harden ci
1 parent cd56edd commit 74fb3b8

24 files changed

+325
-253
lines changed

.github/workflows/codeql.yml

+1
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ jobs:
4343
- name: Harden Runner
4444
uses: step-security/harden-runner@c6295a65d1254861815972266d5933fd6e532bdf # v2.11.1
4545
with:
46+
disable-sudo: true
4647
egress-policy: audit
4748

4849
- name: Checkout repository

.github/workflows/dependency-review.yml

+5-4
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
# PRs introducing known-vulnerable packages will be blocked from merging.
77
#
88
# Source repository: https://github.com/actions/dependency-review-action
9-
name: 'Dependency Review'
9+
name: "Dependency Review"
1010
on: [pull_request]
1111

1212
permissions:
@@ -19,9 +19,10 @@ jobs:
1919
- name: Harden Runner
2020
uses: step-security/harden-runner@c6295a65d1254861815972266d5933fd6e532bdf # v2.11.1
2121
with:
22+
disable-sudo: true
2223
egress-policy: audit
2324

24-
- name: 'Checkout Repository'
25+
- name: "Checkout Repository"
2526
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
26-
- name: 'Dependency Review'
27-
uses: actions/dependency-review-action@0659a74c94536054bfa5aeb92241f70d680cc78e # v4
27+
- name: "Dependency Review"
28+
uses: actions/dependency-review-action@ce3cf9537a52e8119d91fd484ab5b8a807627bf8 # v4.6.0

.github/workflows/package.yml

+2-1
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ jobs:
3030
- name: Harden Runner
3131
uses: step-security/harden-runner@c6295a65d1254861815972266d5933fd6e532bdf # v2.11.1
3232
with:
33+
disable-sudo: true
3334
egress-policy: audit
3435

3536
- name: Check out repository
@@ -82,7 +83,7 @@ jobs:
8283
with:
8384
flags: smart-tests
8485
verbose: true
85-
file: ./coverage.xml
86+
files: ./coverage.xml
8687
fail_ci_if_error: true
8788
env:
8889
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}

.github/workflows/publish.yml

+5-4
Original file line numberDiff line numberDiff line change
@@ -17,13 +17,14 @@ jobs:
1717
- name: Harden Runner
1818
uses: step-security/harden-runner@c6295a65d1254861815972266d5933fd6e532bdf # v2.11.1
1919
with:
20+
disable-sudo: true
2021
egress-policy: audit
2122

2223
- name: Check-out repository
2324
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
2425
with:
2526
fetch-depth: 0
26-
27+
2728
- name: Set up Python
2829
uses: actions/setup-python@8d9ed9ac5c53483de85588cdf95a591a75ab9f55 # v5.5.0
2930
with:
@@ -33,8 +34,8 @@ jobs:
3334
id: cached-poetry
3435
uses: actions/cache@5a3ec84eff668545956fd18022155c47e93e2684 # v4.2.3
3536
with:
36-
path: ~/.local # the path depends on the OS
37-
key: poetry # increment to reset cache
37+
path: ~/.local # the path depends on the OS
38+
key: poetry # increment to reset cache
3839

3940
- name: Install poetry
4041
if: steps.cached-poetry.outputs.cache-hit != 'true'
@@ -56,4 +57,4 @@ jobs:
5657
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
5758
run: |
5859
poetry config pypi-token.pypi $PYPI_TOKEN
59-
poetry publish --build
60+
poetry publish --build

.github/workflows/scorecards.yml

+2-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ on:
1010
# To guarantee Maintained check is occasionally updated. See
1111
# https://github.com/ossf/scorecard/blob/main/docs/checks.md#maintained
1212
schedule:
13-
- cron: '20 7 * * 2'
13+
- cron: "20 7 * * 2"
1414
push:
1515
branches: ["main"]
1616

@@ -33,6 +33,7 @@ jobs:
3333
- name: Harden Runner
3434
uses: step-security/harden-runner@c6295a65d1254861815972266d5933fd6e532bdf # v2.11.1
3535
with:
36+
disable-sudo: true
3637
egress-policy: audit
3738

3839
- name: "Checkout code"

.github/workflows/stale.yml

+5-4
Original file line numberDiff line numberDiff line change
@@ -16,14 +16,15 @@ jobs:
1616
- name: Harden Runner
1717
uses: step-security/harden-runner@c6295a65d1254861815972266d5933fd6e532bdf # v2.11.1
1818
with:
19+
disable-sudo: true
1920
egress-policy: audit
2021

2122
- uses: actions/stale@5bef64f19d7facfb25b37b414482c7164d639639 # v9.1.0
2223
with:
23-
stale-issue-message: 'This issue has been marked as stale because it has been open for 8 weeks with no activity. Please remove the stale label or comment or this issue will be closed in 1 week.'
24-
close-issue-message: 'This issue was closed because it has been inactive for 2 months with no activity.'
25-
stale-pr-message: 'This pull request has been marked as stale because it has been open for 13 weeks with no activity. Please remove the stale label or comment or this pull request will be closed in 1 week.'
26-
close-pr-message: 'This pull request was closed because it has been inactive for 6 months with no activity.'
24+
stale-issue-message: "This issue has been marked as stale because it has been open for 8 weeks with no activity. Please remove the stale label or comment or this issue will be closed in 1 week."
25+
close-issue-message: "This issue was closed because it has been inactive for 2 months with no activity."
26+
stale-pr-message: "This pull request has been marked as stale because it has been open for 13 weeks with no activity. Please remove the stale label or comment or this pull request will be closed in 1 week."
27+
close-pr-message: "This pull request was closed because it has been inactive for 6 months with no activity."
2728
days-before-issue-stale: 56
2829
days-before-issue-close: 7
2930
days-before-pr-stale: 91

README.md

+4-5
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,6 @@
2929
* Coefficient of Range;
3030
* Coefficient of Variation;
3131
* Cole's Index of Dispersion;
32-
* Dispersion Ratio;
3332
* Fisher's Index of Dispersion;
3433
* Gini Mean Difference;
3534
* Linear Coefficient of Variation;
@@ -41,13 +40,11 @@
4140
* Standard Quantile Absolute Deviation;
4241
* Studentized Range.
4342
- Collection of measures of skewness - `obscure_stats/skewness`:
44-
* Area Under the Skewness Curve (weighted and unweighted);
43+
* Area Under the Skewness Curve;
4544
* Bickel Mode Skewness Coefficient;
4645
* Bowley Skewness Coefficient;
47-
* Cumulative Skewness Coefficient;
4846
* Forhad-Shorna Rank Skewness Coefficient;
4947
* Groeneveld Range Skewness Coefficient;
50-
* Groeneveld Skewness Coefficient;
5148
* Hossain-Adnan Skewness Coefficient;
5249
* Kelly Skewness Coefficient;
5350
* L-Skewness Coefficient;
@@ -67,13 +64,15 @@
6764
* Staudte Kurtosis.
6865
- Collection of measures of association - `obscure_stats/association`:
6966
* Blomqvist's Beta;
70-
* Chatterjee Xi Correlation Coefficient (original and symmetric versions);
67+
* Chatterjee Xi Correlation Coefficient;
7168
* Concordance Correlation Coefficient;
7269
* Concordance Rate;
7370
* Fechner Correlation Coefficient;
7471
* Gaussian Rank Correlation Coefficient;
72+
* Normalized Chatterjee Xi Correlation Coefficient;
7573
* Quantile Correlation Coefficient;
7674
* Rank Minrelation Coefficient;
75+
* Symmetric Chatterjee Xi Correlation Coefficient;
7776
* Tanimoto Similarity;
7877
* Tukey's Correlation Coefficient;
7978
* Winsorized Correlation Coefficient;

SECURITY.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Authors of `obscure_stats` library take the security of the open source code rep
99
| Version | Supported |
1010
| ------- | ------------------ |
1111
| 0.1.x | :x: |
12-
| 0.2.x | :white_check_mark: |
12+
| 0.2.x | :x: |
1313
| 0.3.x | :white_check_mark: |
1414
| 0.4.x | :white_check_mark: |
1515

pyproject.toml

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ classifiers = [
2424

2525
[tool.poetry.dependencies]
2626
python = ">=3.10,<3.14"
27-
numpy = "^1.23.5"
27+
numpy = "^2.0.0"
2828
scipy = "^1.9.1"
2929

3030
[tool.poetry.group.dev.dependencies]

src/obscure_stats/association/__init__.py

+2
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
concordance_rate,
88
fechner_correlation,
99
gaussain_rank_correlation,
10+
normalized_chatterjee_xi,
1011
quantile_correlation,
1112
rank_minrelation_coefficient,
1213
symmetric_chatterjee_xi,
@@ -23,6 +24,7 @@
2324
"concordance_rate",
2425
"fechner_correlation",
2526
"gaussain_rank_correlation",
27+
"normalized_chatterjee_xi",
2628
"quantile_correlation",
2729
"rank_minrelation_coefficient",
2830
"symmetric_chatterjee_xi",

src/obscure_stats/association/association.py

+104-26
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,8 @@ def chatterjee_xi(x: np.ndarray, y: np.ndarray) -> float:
7373
underlying distributions of the variable.
7474
7575
It ranges from 0 (variables are completely independent) to 1
76-
(one is a measurable function of the other).
76+
(one is a measurable function of the other). But a lot of the times the maximum
77+
value of the coefficient is lower than 1.
7778
7879
This implementation does not break ties at random, instead
7980
it break ties depending on order. This makes it dependent on
@@ -212,10 +213,10 @@ def concordance_rate(x: np.ndarray, y: np.ndarray) -> float:
212213
sem_y = np.std(y, ddof=0) / n**0.5
213214
return float(
214215
(
215-
np.sum((x > mean_x + sem_x) & (y > mean_y + sem_y))
216-
+ np.sum((x < mean_x - sem_x) & (y > mean_y + sem_y))
217-
- np.sum((x < mean_x - sem_x) & (y < mean_y - sem_y))
218-
- np.sum((x > mean_x + sem_x) & (y < mean_y - sem_y))
216+
np.sum((x >= mean_x + sem_x) & (y >= mean_y + sem_y))
217+
- np.sum((x <= mean_x - sem_x) & (y >= mean_y + sem_y))
218+
+ np.sum((x <= mean_x - sem_x) & (y <= mean_y - sem_y))
219+
- np.sum((x >= mean_x + sem_x) & (y <= mean_y - sem_y))
219220
)
220221
/ n
221222
)
@@ -228,7 +229,8 @@ def symmetric_chatterjee_xi(x: np.ndarray, y: np.ndarray) -> float:
228229
underlying distributions of the variable.
229230
230231
It ranges from 0 (variables are completely independent) to 1
231-
(one is a measurable function of the other).
232+
(one is a measurable function of the other). But a lot of the times the maximum
233+
value of the coefficient is lower than 1.
232234
233235
This implementation does not break ties at random, instead
234236
it break ties depending on order. This makes it dependent on
@@ -311,28 +313,27 @@ def zhang_i(x: np.ndarray, y: np.ndarray) -> float:
311313
312314
References
313315
----------
314-
Zhang, Q. (2023).
315-
On relationships between Chatterjee's and Spearman's correlation coefficients.
316-
arXiv preprint arXiv:2302.10131.
317-
318-
Notes
319-
-----
320-
This measure is assymetric: (x, y) != (y, x).
316+
Zhang, Q. (2025).
317+
On the extensions of the Chatterjee-Spearman test.
318+
Journal of Nonparametric Statistics, 1-30.
321319
322320
See Also
323321
--------
324322
scipy.stats.spearmanr - Spearman R coefficient.
325-
obscure_stats.associaton.chatterjee_xi - Chatterjee Xi coefficient.
323+
obscure_stats.associaton.symmetric_chatterjee_xi - Chatterjee Xi coefficient.
326324
"""
327325
if _check_arrays(x, y):
328326
return np.nan
329327
x, y = _prep_arrays(x, y)
330328
if _check_arrays(x, y):
331329
return np.nan
332330
return float(
333-
max(
334-
abs(stats.spearmanr(x, y, nan_policy="omit")[0]),
335-
2.5**0.5 * chatterjee_xi(x, y),
331+
min(
332+
1.0,
333+
max(
334+
abs(stats.spearmanr(x, y, nan_policy="omit")[0]),
335+
2.5**0.5 * symmetric_chatterjee_xi(x, y),
336+
),
336337
)
337338
)
338339

@@ -596,13 +597,11 @@ def tukey_correlation(x: np.ndarray, y: np.ndarray) -> float:
596597
s_y = gini_mean_difference(y)
597598
x_norm = x / s_x
598599
y_norm = y / s_y
599-
return float(
600-
0.25
601-
* (
602-
gini_mean_difference(x_norm + y_norm) ** 2
603-
- gini_mean_difference(x_norm - y_norm) ** 2
604-
)
600+
coef = 0.25 * (
601+
gini_mean_difference(x_norm + y_norm) ** 2
602+
- gini_mean_difference(x_norm - y_norm) ** 2
605603
)
604+
return float(max(min(coef, 1.0), -1.0))
606605

607606

608607
def gaussain_rank_correlation(x: np.ndarray, y: np.ndarray) -> float:
@@ -640,10 +639,10 @@ def gaussain_rank_correlation(x: np.ndarray, y: np.ndarray) -> float:
640639
norm_factor = 1 / (n + 1)
641640
x_ranks_norm = (np.argsort(x) + 1) * norm_factor
642641
y_ranks_norm = (np.argsort(y) + 1) * norm_factor
643-
return float(
644-
np.sum(stats.norm.ppf(x_ranks_norm) * stats.norm.ppf(y_ranks_norm))
645-
/ np.sum(stats.norm.ppf(np.arange(1, n + 1) * norm_factor) ** 2)
642+
coef = np.sum(stats.norm.ppf(x_ranks_norm) * stats.norm.ppf(y_ranks_norm)) / np.sum(
643+
stats.norm.ppf(np.arange(1, n + 1) * norm_factor) ** 2
646644
)
645+
return float((coef - 0.5) * 2)
647646

648647

649648
def quantile_correlation(x: np.ndarray, y: np.ndarray, q: float = 0.5) -> float:
@@ -689,3 +688,82 @@ def quantile_correlation(x: np.ndarray, y: np.ndarray, q: float = 0.5) -> float:
689688
np.mean((q - (y < np.quantile(y, q=q))) * (x - np.mean(x)))
690689
/ (((q - q**2) * np.var(x)) ** 0.5)
691690
)
691+
692+
693+
def normalized_chatterjee_xi(x: np.ndarray, y: np.ndarray) -> float:
694+
"""Calculate normalizd Xi correlation coefficient.
695+
696+
Another variation of rank correlation which does not make any assumptions about
697+
underlying distributions of the variable.
698+
699+
It ranges from 0 (variables are completely independent) to 1
700+
(one is a measurable function of the other). This variant normalizes Chatterjee Xi,
701+
so it's maximum will always be 1.0.
702+
703+
This implementation does not break ties at random, instead
704+
it break ties depending on order. This makes it dependent on
705+
data sorting, which could be useful in application like time
706+
series.
707+
708+
The arrays will be flatten before any calculations.
709+
710+
Parameters
711+
----------
712+
x : array_like
713+
Input array.
714+
y : array_like
715+
Input array.
716+
717+
Returns
718+
-------
719+
nxi : float.
720+
The value of the normalized xi correlation coefficient.
721+
722+
References
723+
----------
724+
Dalitz, C.; Arning, J.; Goebbels, S. (2024).
725+
A Simple Bias Reduction for Chatterjee's Correlation.
726+
J Stat Theory Pract 18, 51.
727+
728+
Notes
729+
-----
730+
This measure is assymetric: (x, y) != (y, x).
731+
"""
732+
if _check_arrays(x, y):
733+
return np.nan
734+
x, y = _prep_arrays(x, y)
735+
if _check_arrays(x, y):
736+
return np.nan
737+
n = len(x)
738+
# y ~ f(x)
739+
y_forward_ordered = y[np.argsort(x)]
740+
_, y_unique_indexes, y_counts = np.unique(
741+
y_forward_ordered, return_inverse=True, return_counts=True
742+
)
743+
right_xy = np.cumsum(y_counts)[y_unique_indexes]
744+
left_xy = np.cumsum(y_counts[::-1])[len(y_counts) - y_unique_indexes - 1]
745+
# y ~ f(y)
746+
y_ordered = y[np.argsort(y)]
747+
_, y_unique_indexes, y_counts = np.unique(
748+
y_ordered, return_inverse=True, return_counts=True
749+
)
750+
right_yy = np.cumsum(y_counts)[y_unique_indexes]
751+
left_yy = np.cumsum(y_counts[::-1])[len(y_counts) - y_unique_indexes - 1]
752+
# divide one by another
753+
return float(
754+
max(
755+
-1,
756+
(
757+
1
758+
- 0.5
759+
* np.sum(np.abs(np.diff(right_xy)))
760+
/ np.mean(left_xy * (n - left_xy))
761+
)
762+
/ (
763+
1
764+
- 0.5
765+
* np.sum(np.abs(np.diff(right_yy)))
766+
/ np.mean(left_yy * (n - left_yy)),
767+
),
768+
)
769+
)

0 commit comments

Comments
 (0)