Skip to content

Create giotto-tda v0.3.0 #517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 199 commits into from
Oct 9, 2020
Merged

Create giotto-tda v0.3.0 #517

merged 199 commits into from
Oct 9, 2020

Conversation

ulupo
Copy link
Contributor

@ulupo ulupo commented Oct 6, 2020

No description provided.

ulupo and others added 30 commits June 2, 2020 10:00
* Add pyflagser >= 0.4.0 to requirements, GH pages installation instructions, and README.rst

* Add FlagserPersistence

* Add tests for FlagserPersistence

* Add FlagserPersistence to doc index

* Add See Also entries in simplicial

* Linting in ripser_interface

* Update See also for Cubical

* Fix max_edge_length docs in other simplicial transformers

* Add tests to check max_edge and infinity_values

* Fix low infinity value bug and put dimension padding in _utils

* Add test for low infinity values

* Simplify code in homology/_utils.py

Signed-off-by: Guillaume Tauzin <[email protected]>
Co-authored-by: Umberto <[email protected]>
Co-authored-by: Wojciech Reise <[email protected]>
* Refactor GraphGeodesicDistance

- Change return type to list for compatibility with homology transformers
- Use scipy's shortest_path instead of sklearn's graph_shortest_path
- Make clearer rules on the role of zero entries, infinity entries, and non-stored values
- Add parameters directed, unweighted, and method
- Support masked arrays
- Rewrite docstring

* Fix See Also in KNeighborsGraph and TransitionGraph

* Prevent using FW algorithms when some edges are zero

- Work around scipy/scipy#12424
- Introduce user warnings when algorithm cannot be chosen automatically or set to FW
- Fix test ground truth
- Improves array conversion in GraphGeodesicDistance.transform

* Add GraphGeodesicDistance unit tests on sparse/masked array input and more dtypes
Extend list of ignored extensions to other Python bytecode-type files
* Modify check_point_cloud to allow for sparse input

- Allow for sparse input in VietorisRipsPersistence and FlagserPersistence when metric is precomputed.
- Fix docstrings to reflect the changes.
- Fix some typos and wording issues.

* Make docstrings more consistent across simplicial transformers

* Fix reference for FlagserPersistence

* Remove warning for different point cloud embedding dimensions

* Eliminate tests for different embedding dimension warnings

* Convert list input to 3D ndarray when possible

* Update my email address

* Accept sparse inputs even in non-precomputed case for VietorisRips and SparseRips

* Cover sparse cases and more metrics in simplicial unit tests

Co-authored-by: Umberto <[email protected]>
* Update @lewtun's and @ulupo's emails in CODE_AUTHORS.rst

* Update @lewtun's and @ulupo's emails in GOVERNANCE.rst
* Fix mmap settings used by joblib.Parallel in HeatKernel and PersistenceImage

* Add tests

* Slice X first inside parallel calls to _subdiagrams

* Minor simplifications in plot_diagram

* Add @NickSale to CODE_AUTHORS.rst

Co-authored-by: Umberto Lupo <[email protected]>
* Remove **input_layout kwargs, refactor layouts in plot_diagram

- Fix #409
- Fix calculation of the maximum filtration parameter which incorrectly included the homology dimensions
- Simplify code
…tion (#420)

* Transpose output shape in PairwiseDistance.transform

* Update tests

* Simplify _parallel_pairwise

Co-authored-by: Umberto Lupo <[email protected]>
* Remove `_sort` and refactor `filter` in gtda/diagrams/_utils.py

Arrays are no longer sorted by lifetime before filtering.

* Fix test for Filtering

* Improve documentation, begin addressing #233

* Small improvement to `_subdiagrams`
* Refactor of plotting API

- Make all functions in gtda/plotting return figures (or tuples of figures for betti_surfaces) instead of showing them
- `plot` class methods also return figures instead of showing them
- `transform_plot` and `fit_transform_plot` still show figures and only return transformed data
- Add `plot_params` kwarg throughout to allow user customisability of output figures (subtlety: one of the key can be either "trace" when the output figure only has one trace, or "traces" when it has several)

* Suppress user warnings on graph geodesic distance algorithms in tests

* Resolve overflow warnings in mapper filter tests

* Resolve numpy DeprecationWarning in test_validation
…vertext, improve docs (#445)

* Add pullback set ID to mapper hovertext

* Add partial cluster label to Mapper hovertext

* Improve docs for plot_static_mapper_graph and plot_interactive_mapper_graph

* Fix tests after changes

* Slight wording improvement

* Further wording clarification

* Make wording on edges even clearer
…e, add `store_edge_elements` kwarg to Nerve and make_mapper_pipeline, add Nerve and ParallelClustering to docs (#447)

* Refactor igraph.Graph output, add Nerve and ParallelClustering to docs, add store_edge_elements kwarg to Nerve and make_mapper_pipeline

- Store node metadata not as a graph-level dictionary, but as vertex attributes accessible by graph.vs[attr_name][node_id] or graph.vs[node_id][attr_name] for attr_name in ["pullback_set_label", "partial_cluster_label", "node_elements"]
- Remove "node_id" from node attributes as it always coincided with the igraph.Graph node number anyway.
- Automatically store sizes of intersections as edge weights, accessible by graph.es["weight"].
- Add "store_intersections" kwarg to Nerve and make_mapper_pipeline to allow storing indices of node intersections as edge attributes, accessible via graph.es["edge_elements"].
- Simplify logic of Nerve.fit_transform code
- Change the attributes stored by Nerve.fit. Now the entire graph is stored as graph_ instead.
- Improve documentation of make_mapper_pipeline
- Expose ParallelClustering and Nerve in __init__ and docs.
- Adapt tests, mapper quickstart notebook, and mapper plotting functions.

* Add two tests for the behaviour of store_edge_elements and min_intersection

* Remove check for shape of `layout` in _calculate_graph_data

`layout` can only be a string or a callable, not an ndarray

* Improve test coverage of mapper visualization modules

* Create tests for plot_betti_surfaces and plot_betti_curves

* Add plotly_params to remaining plot methods in diagrams/representations, missed out in #441 

* Fix some linting and docstrings

* Minor improvements

* Avoid shadowing range function in plot_diagram
* Fix bug introduced in 4bc90b2

np.max should have been np.min in plot_diagram for minimum birth and death calculation

* Clean/simplify plot_diagram further to be more ready for extended persistence and better behave under plotly HTML autoscaling

* Make `store_edge_elements` work with MapperPipeline.get_mapper_params and MapperPipeline.set_params, missed out in 4bc90b2
…s example notebook (#448)

* Remove displayed SegmentLocal from classifying_shapes example notebook

* Fix doc typos
* Fix pybind11 broken master

pybind11 checkout version v2.5.0, some issues are observed on master

Signed-off-by: julian <[email protected]>
…ogy_dimension_ix (#452)

* Rename homology_dimension_ix to homology_dimension_idx, fix bug
* Add normalization of persistence entropy via `normalize` kwarg in gtda.diagrams.PersistenceEntropy

* Use entropy from scipy.stats in gtda.diagrams.PersistenceEntropy, gtda.time_series.PermutationEntropy and gtda.mapper.Entropy

* Rename _entropy hidden method in gtda.diagrams.PersistenceEntropy

* Add tests for normalize=True

* Add `fill_nan_value` kwarg to gtda.diagrams.PersistenceEntropy

- See #450 (comment)
- Adapt pipeline tests to use this kwarg

Co-authored-by: Umberto Lupo <[email protected]>
)

* Fix y-axis in HeatKernel.plot

* Add titles to figures generated via plot_heatmap in plot class methods

* PEP8 E133 improvements
* First attempt at fix of #438

* Fix y-axis in `PersistenceImage` plots, extending #453

* Represent multiplicity of persistence pairs in hovertext in `plot_diagram`

* Change reflect mode of `gaussian_filter` to "constant" from "reflect"

Affects `HeatKernel` and `PersistenceImage`

* Fix `PersistenceLandscape` plot method

- Only the figure corresponding to the first seen homology dimension was returned
- Output is now a figure with subplots, a main title, and one subtitle per plot (homology dimension)

* Improve tests for plot methods in gtda.diagrams.representations

Cover use of `plotly_params`

* Minor docstring linting

* Miscellaneous docstring improvements in gtda/diagrams

* Fix validation dictionary for `metric_params` in the case of `PersistenceImage`

* Change default value of `order` in Amplitude, from 2. to None (vector features)

* Change meaning of default None for `weight_function` in `PersistenceImage`

- ``None`` corresponding the identity means that there really is a non-trivial weighting in that case. Semantically, this does not seem correct ("None" should mean no weighting at all)

* Improve code style and clarity in plot methods in gtda.diagrams.representations

* Refactor gtda/diagrams/_metrics.py to fix several bugs

- Change computation of heat/persistence image distances and amplitudes to yield the continuum limit when `n_bins` tends to infinity.
- Make `sigma` in persistence-image-- and heat-kernel--based representations/distances/amplitudes measured in the same units as the filtration parameter (not in pixels), thus decoupling its effect from the effect of `n_bins`. Also change the default value in PairwiseDistance, Amplitude, HeatKernel and PersistenceImage from 1. to 0.1.
- Remove trivial points from diagrams before creating a sampled image in `heats` and `persistence_images`. This ensures in particular that the trivial points really give no contribution regardless of the weight function used for persistence images.
- Finish fixing #438 and similar issues with PairwiseDistance when the metric is persistence-image--based.
- Ensure `silhouettes` does not create NaNs when a subdiagram is trivial.
- Change default value of `power` in `silhouette_amplitudes`
 and `silhouette_distances` to 1., to agree with Amplitude and PairwiseDistance docs.
- Fix use of np.array_equal in `persistence_image_distances` and `heat_distances` -- so far the boolean check for equal inputs always failed due to the in-place modifications of `diagrams_1`.
- No longer store weights in `effective_metric_params_` attribute of PairwiseDistance and Amplitude when the metric is persistence-image--based.
- Remove _heat from gtda.diagrams._metrics.
- Remove identity from gtda.diagrams._utils and make default `weight_function` kwargs equal to np.ones_like everywhere to agree with default in `PersistenceImage`.
- Other style improvements (variable names, linting)

* Fix trace name when homology dimension is np.inf in `BettiCurve` and `Silhouette`

* Adapt `test_all_pts_the_same` to new behaviour of `heats_` in gtda.diagrams._metrics

* Improve test coverage of `Amplitude` and `PairwiseDistance`

- Make sure silhouettes and persistence images are covered throughout
- Cover `order` parameter throughout

* Add test of zero `weight_function` for `PersistenceImage`

* Make behaviour of `Scaler.fit` when the metric is persistence image the same as `Amplitude`

Accordingly add more combinations of metrics and metric_params in tests for Scaler

* Delete never-used `_matrix_wrapper` and `_arrays_wrapper` functions

* Remove `_pad` from gtda.diagrams._utils as it is never used

* Make `copy=True` in calls to check_diagrams in Scaler.transform and Scaler.inverse_transform

* Make `homology_dimensions_` attributes tuples instead of lists, with integers when possible

* Improve code style

* Hard-code zero array outputs by `heats` and `persistence_images` when step sizes are zero

* Add `homology_dimensions` kwarg to `_bin`

Achieves beautification of self._samplings and self.samplings_ (ints shown instead of floats) in several transformers (and saves some computation)

* Adapt choices of `min_values`, `max_values` and `sigma` in hypothesis-based tests

New meaning of sigma led to overflow issues in existing tests.

* Make all homology dimensions equal in `test_hk_big_sigma`

Also extend this test to `PersistenceImage`, and rename it accordingly

* Cover use of `plotly_params` kwarg in diagram preprocessing classes plot methods

* Extract some common logic from plot methods in gtda.diagrams.representations

* Silence expected warnings from image transformers in test_common

* Implement @wreise's suggestion to abstract away sorting and integer conversion of fit hom dims

* Add tests for `Amplitude` and `PairwiseDistance` to check that a zero weight function yields identically zero amplitudes/distances.

* Refactor `_subdiagrams` to throw informative errors on unfulfilled input properties
ulupo and others added 24 commits September 17, 2020 09:35
* removed check_collection from docs

* Fix references in takens and simplicial

* Change todo in weak alpha filtration reference to the section name, remove empty line from release

* Add check_collection again

* Reference ripser [1]

* Move the references before the full stops

* Replace missing glossry entries with TODOs, following Umberto's comment

* Add missing references

* Flag weak_alpha_complex as missing entry

* Reference in the documentation

* Add a target for testing notebooks in the docs

* Make html does not fail on missing versions

* Remove spaces after references, check that publication names are in italic

* Remove unused import and sort the remaining ones

* Provide an explicit name for the substituition

* Copy images from examples to notebooks

* Remove the use of imgonverter, change the logo back to svg and add supported image types list

* Name for substitution did not work

* Typo fix in notebook

* Fix links to time series classification notebook by @lewtun

* Move datasets.py and gravitational-wave-signals.npy to new data subfolder, and rename them

* Fix errors in check_point_clouds docstrings

* Try easier paths in See also

Test for @wreise

* Create captions across all notebooks

* Small wording improvement in persistent_homology_graphs.ipynb

* Change the path (sub)package in the index

* Implement double backticks consistently

* Simplify some captions

* Slightly reduce size of plots by plot_point_cloud

* Fix some errors in make_mapper_pipeline docs

* Fix typo in Nerve docs

* Change citation style for consistency

* Remove new lines in array printouts in docs

* Make citation style more consistent throughout

* Reorder notebooks, modularize Makefile

* Reformat the references

* More link and backtick fixes in mapper_quickstart

* More RST fixes in notebooks

* Fix linting

* Fix an error in WeakAlphaPersistence Notes

* Fix URLs for GUDHI

* Fix TakensEmbedding docstrings

* Linting

* Try changing some SVG attributes for better display

* Further SVG improvements

Co-authored-by: Umberto Lupo <[email protected]>
…nd FirstHistogramGap (#412)

* Add first tests for plot_interactive_mapper_graph

* Change deprecated 'overflow_y' to 'overflow' property in_logging, and remove unnecessary warning catching

* Avoid name clashes with Python built-ins throughout mapper/tests

* Slightly change meaning of `max_fraction` in FirstSimpleGap and FirstHistogramGap

Make the default 1. instead of None, and give it a simpler interpretation: (the floor of) max_fraction * n_samples is the maximum number of clusters the algorithm can return

* Implement a looser criterion for the hk_pi_big_sigma test

Co-authored-by: Umberto Lupo <[email protected]>
- `continuous_updates` is deprecated from ToggleButton
- `np.stack` should not take generators
- should explicitly pass dtype=object for ragged arrays
* Add NumberOfPoints

* Make See alsos more consistent throughout diagrams.features

Sined-off-by: Guillaume Tauzin <[email protected]>
Co-authored-by: wreise <[email protected]>
Co-authored-by: Umberto Lupo <[email protected]>
* Fix validate_params bug and improve behaviour when both numeric and list-like types are allowed

* Add a test

* Fix mistake in validate_params docs
* Add MODULE keyword in CMakeLists.txt to explicitly show expectation

* Update collapser to latest standard in the CMake file

* Add pybind11 as a submodule

* Remove downloading pybind11 from setup.py

Signed-off-by: julian <[email protected]>
Co-authored-by: Umberto Lupo <[email protected]>
* Update ripser & collapser bindings to allow pass by reference with numpy

* Update making the distance matrix triangular more memory friendly

* Remove unnecessary dict inherited from ripser.py cython

Signed-off-by: julian <[email protected]>
* Add MNIST image classification/full blown ML example notebook

Signed-off-by: Guillaume Tauzin <[email protected]>
Co-authored-by: Lewis Tunstall <[email protected]>
Co-authored-by: Umberto Lupo <[email protected]>
* Add ComplexPolynomial

Signed-off-by: Guillaume Tauzin <[email protected]>
Co-authored-by: Umberto Lupo <[email protected]>
Co-authored-by: wreise <[email protected]>
* Fix precomputed behaviour in KNeighborsGraph

- Avoid warnings when metric is passed as 'precomputed'
- Fix errors in docstrings and improve wording
- Improve tests
Implement suggestions in #501 (comment)
- Use scipy's squareform function for fast extraction of the upper diagonal part of dm
* Add DensityFiltration

* Remove effective_metric_params in radial

* Replace warnings in image subpackage by ValueErrors for input dim > 4

* Remove unnecessary _is_fitted from ImageToPointCloud

* Improve See alsos in images/filtrations.py

* Add tests for bad input shapes in image subpackage

Signed-off-by: Guillaume Tauzin <[email protected]>
Co-authored-by: Umberto Lupo <[email protected]>
…or (#495)

* Add gtda/metaestimators with CollectionTransformer meta-estimator 

* Fix docstring for make_mapper_pipeline

* Cchange parallel_backend_prefer default to None in ParallelClustering and make_mapper_pipeline

* Cross-reference two time series notebooks with See also

* Improve time series classification notebook by making use of CollectionTransformer
…513)

Also make mapper tests more lenient to reduce number of flaky tests

Co-authored-by: ulupo <[email protected]>
… features from them (#480)

* Add curves subpackage with StandardFeatures

* Reshape output of PersistenceLandscape so that it's a multi channel curve

* Add curves to main init and to rst docs

* Add Feature extraction subtitle in curves.rst

* Split simplicial homology into undirected and directed subsections

* Fix typo in validate_params

* Raise deadlines for some mapper tests

* Remove notebook tests even in macOS jobs

Signed-off-by: Guillaume Tauzin <[email protected]>
Co-authored-by: Umberto Lupo <[email protected]>
* Add curves.Derivative

* Fix curves.StandardFeatures docs

Signed-off-by: Guillaume Tauzin <[email protected]>

Co-authored-by: Umberto Lupo <[email protected]>
Co-authored-by: wreise <[email protected]>
…equired (#508)

* Improve test coverage of mapper visualization tools

* Make pandas part of test requirements in setup.py

* Add pandas installation in Linux Azure jobs

* Avoid notebooks checks even in macOS unless notebooks_checks is true

* Make clusterer a required parameter in ParallelClustering, add ParallelClustering tests

* Change kind of error for bad transformers in CollectionTransformer

* Improve check_diagrams and its tests

* Use validate_params in plot_point_cloud

* Cover plotting of diagrams with infinite deaths in simplicial tests
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@CLAassistant
Copy link

CLAassistant commented Oct 6, 2020

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
5 out of 6 committers have signed the CLA.

✅ ulupo
✅ lewtun
✅ wreise
✅ gtauzin
✅ reds-heig
❌ MonkeyBreaker
You have signed the CLA already but the status is still pending? Let us recheck it.

@ulupo ulupo changed the title Release 0.3.0 Create giotto-tda v0.3.0 Oct 6, 2020
* Bump version to 0.3.0

* Add release notes

* Minor corrections in notebooks

Co-authored-by: wreise <[email protected]>
@ulupo ulupo marked this pull request as ready for review October 8, 2020 16:18
@ulupo ulupo merged commit c826b5b into 0.3.X Oct 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants