-
Notifications
You must be signed in to change notification settings - Fork 35
eXplaianble tool for scientists - Deep Insight And Neural Networks Analysis (DIANNA) #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello, I was wondering if this pre-submission form has been noticed? |
Hi @elboyran and welcome to the pyOpenSci community. @lwasser and I have discussed whether the package is in scope. The short answer is: yes, we think DIANNA could be in scope, but we do have a couple of questions. We are still learning about how we can best support the scientific community using Python. It's clear from your inquiry and others like it that there's a need for tooling that supports researchers using deep learning methods. However, these kind of tools are not yet as common in the rOpenSci space, and we have modeled ourselves on that community. So we are figuring this out as we go. Here are the areas of scope where we think the goals of DIANNA fit into pyOpenSci:
I'll explain each. data visualizationThis one you identified and is fairly self-evident, although reasonable people could disagree about whether XAI methods are really visualizing data per se. But no one can deny that scientists in particular rely on visualizations to validate and tune algorithms they apply to data. So I think I would call this one "clearly in scope". reproducibilityAs you state, one of your goals is to replicate existing methods, with the overall goal of assisting domain scientists. The additional goal of achieving this through the ONXX standard would also increase reproducibility / replicability. There is precedent for this being in scope: we previously provided review for data extractionI think meeting the ONXX standard could fall under this heading; you would make it possible for researchers to use XAI methods in a framework-agnostic manner, if I am understanding correctly. So DIANNA would allow researchers to "extract data" (trained weights) from models. So, with that said, here are our questions:
So overall we do feel that DIANNA could be in scope and we would be very interested in providing review. As of right now, though, I think we would need to see the requirement met that the package is near a "maturing" state, before initiating a review. Unless @lwasser feels very strongly otherwise. Maybe it would be best to contact us again when you feel you are nearing that state? |
Dear David (@NickleDave) and @lwasser, Thank you very much for your time and detailed feedback. First the questions covered by short answers:
Data visualization and in a way reproducibility are matching categories. I am not sure if the Data extraction is a good category and maybe you would agree with me after my answers to your questions.
Some of the work explanations above have happened in a sibling repo (we are considering in merging them), but all motivation and plan is described in my initial funding proposal, which could send if interested. I am not sure I understand “So DIANNA would allow researchers to "extract data" (trained weights) from models.” Doesn’t any (X)AI library allow that? Also, imho, this is no the main function of a XAI library. Will be glad to hear your opinion on the scope given these extra explanations. Is it possible to expand your list (as your repo documentation suggests) and include e.g. data analytics? Also, please, send me your thoughts on how can we connect on organizational level. |
Hi @elboyran and thank you for your clear point-by-point response. Everything you are saying makes total sense to me. However, another factor we would need to consider is that XAI is an active research area. I'm sure you are aware of this, as your statement about "skeptical scientists" makes clear. We would really need to make sure we have the right reviewers, that could address questions about XAI methods, especially with respect to fairness, bias, interpretability and usability. I do not think we can provide you with this kind of review at this time. As we state in our guide:
You are also right that this would be venturing into the territory of "data analytics". I did discuss this with @lwasser who also felt that adding an analytics category would go beyond our intended scope. We are following the ROpenSci model where our central focus is on "tooling for the data life cycle". That community has only very recently added statistics packages. As a relatively new organization, we are just not able to extend our scope the same way. You may be better served by going through review at a traditional journal. Please let me know how that sounds. I am happy to discuss with you further. |
Closing this for now but again we are happy to discuss further |
Submitting Author: Elena Ranguelova (@elboyran)
Package Name: dianna
One-Line Description of Package: an eXplainable AI (XAI) python library targeted at scientists
Repository Link (if existing): https://github.com/dianna-ai/dianna
Description
Modern scientific challenges are often tackled with (Deep) Neural Networks (DNN). Despite their high predictive accuracy, DNNs lack inherent explainability. Many DNN users, especially scientists, do not harvest DNNs power because of lack of trust and understanding of their working.
Meanwhile, the eXplainable AI (XAI) methods offer some post-hoc interpretability and insight into the DNN reasoning. This is done by quantifying the relevance of individual features (image pixels, words in text, etc.) with respect to the prediction. These "relevance heatmaps" indicate how the network has reached its decision directly in the input modality (images, text, speech etc.) of the data.
There are many Open Source Software (OSS) implementations of these methods, alas, supporting a single DNN format and the libraries are known mostly by the AI experts. The DIANNA library supports the best XAI methods in the context of scientific usage providing their OSS implementation based on the ONNX standard and demonstrations on benchmark datasets. Representing visually the captured knowledge by the AI system can become a source of (scientific) insights.
It is work in progress, for now DIANNA supports part of chosen by objective criteria AI explainability methods such as RISE, LIME
and DeepLIFT SHAP (under development) for ONNX models, which means it's AI framework agnostic.
Scope
Please indicate which category or categories this package falls under:
Explain how and why the package falls under these categories (briefly, 1-2 sentences). Please note any areas you are unsure of:
A possible category could be Data visualization, but the actual one is missing from the list. It would fall under the broad one of Data Analytics and more precisely eXplainable AI.
Who is the target audience and what are the scientific applications of this package?
Scientists in any domain (especially not (X)AI experts), but also any other AI users who want to open the AI "black boxes". Very much so also for the XAI developers who want to study and compare against the proposed benchmarks we offer, the properties of their methods to the state-of-the-art ones. The scientific application potential is enormous and not limited to any science domain. Examples are given , e,g, in these publications:
Explainable Machine Learning for Scientific Insights and Discoveries
Explaining Deep Neural Networks and Beyond: A Review of Methods and Applications, nice summary illustration in Fig. 20 on page 268
Are there other Python packages that accomplish similar things? If so, how does yours differ?
As mentioned in the Description, there are many packages (shap, lime, etc.) and even libraries (e.g. Captum, iNNvestigate, etc.) implementing eith a single XAI method or a group of methods (without a clearly motivated choice, than own research) for a single DNN format (e.g. only pytorch or keras (tensorflow)) . With the careful objective selection of XAI methods and the choice of ONNX standard supported, it makes it the only library applicable to any trained AI model, independent of the framework, hence espacialaly useful for any domain scientists. We aim to support also many data domains- images, text, and in the future time-series, tabular and graph data.
Any other questions or issues we should be aware of:
It is work in progress, but we are already thinking of how to disseminate it. We want to reach and be useful to as many domains scientists as possible. That is the mission of our organization- the Netherlands eScience center.
The question is does it fit in the scope of pyOpenSci?
P.S. *Have feedback/comments about our review process? Leave a comment here
The text was updated successfully, but these errors were encountered: