-
Notifications
You must be signed in to change notification settings - Fork 2
[joss] In the statement of need, how does it compare with OSS annotation tools? #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Dear @kinow |
Hi @muctadir ! What about https://web.hypothes.is/? This is one that I have seen added to Open Source tools to annotate text, and also used in previous companies where I worked. I think that's one of the most popular tools used to annotate text, and appears to have many overlapping features with Lama. Thank you! |
Dear @kinow I am looking in the tool that you mentioned and trying to find out the feature set that it provides. For now, I understand that it can annotate texts from various sources and share these annotation across multiple users. I am now trying to find out what other features it provides. Is there a documentation page that you know of? Furthermore, I have a feeling that this tool is about sharing knowledge by something they call "social annotation". And if that is the case, I think the use-case is still different. |
Hi @muctadir, I think hypothesis could do with a simpler web page, that lets users find more about its features with less clicks. I had a look and found these resources that should be useful, I hope, for you to view how to get started with it:
I think hypothesis has similar features, some interesting features that could be useful in lama (moderation, browser extension), but also it could lack features that are important in lama (like conflict resolution). This paper, for example, mentions brat and hypothesis, and explains why KAT is still important. I think the lama paper is doing a great job explaining that there are commercial tools but the complex collaboration is simpler in lama. However, after reading the paper I am still left with the question whether there are Open Source tools that could be used instead (especially important for lama's paper, IMHO, as it's being published in the JOSS). Hypothesis should check the boxes for Cost and Data access and privacy, but maybe the collaboration workflow doesn't match your use case? Or maybe there are other Open Source annotations tools that have the complex collaboration, but lack the data access and privacy, or tools that have everything that lama does, but are not maintained, etc. I think a short paragraph about it would be enough for the lama paper. Cheers |
I agree that this is missing. I will add a paragraph based on what we investigated initially before developing LaMa. |
In light of the current word count in the paper, I have added a sentence to refer to https://labelstud.io/ which we investigated prior to developing LaMa. |
I wasn't aware of Label Studio. Thanks for mentioning it and updating the paper. Looking at this commit, ac348f9, the text below lists the cons of the solutions (including Label Studio). The first being "Cost: As these are commercial tools", which is not correct for Label Studio? They appear to have a commercial SASS version, but the code is open source (like LaMa's code, also using a permissive license - ALv2). The second point is about data access and privacy. Label Studio also has a page about security (https://labelstud.io/guide/security.html) but I think it a wider sense, including database access. But on their documentation you can find more about granting permissions to different users (https://labelstud.io/guide/signup.html#Invite-collaborators-to-a-project). So I think they also provide data access and privacy, and I guess it could be well tested since they are in a commercial operation. Label Studio also seems to offer extra features like image labelling, ML assisted labelling, and other features related to the third point in the list in the paper, about complex collaboration workflow, e.g.
I have not dug into their issues & code, nor signed up for their demo, or tries running it locally. Before doing that, could you elaborate more how it was compared to LaMa, and how did your team identified that it was not sufficient to use Label Studio. Moreover, given that Cost is one of the three items raised as the motivation for LaMa, I think a single Open Source tool is not enough to drive the need for a new tool. It would be better to expand that in the paper too. We can ping also the editor to have another opinion here, @fboehm, as well the other reviewer @luxaritas |
Somehow I missed to fix this text. I just updated the paper with correct text here. To answer the reminder of the comment I would like to refer to one of my previous comment. And I would like to quote parts of that reply:
In light of these two comments I made earlier, you can already see how Label Studio has a different use-case, which is about annotating data. You also mentioned about ML assisted labeling with is not what we wanted for LaMa.
You focused on cost in your comment and as I mentioned earlier, all the 3 points are equally important. You are indeed, to some extend, correct about the first two points. However, collaboration is also a key motivation which includes features such as collaborative labeling and conflict detection and resolution. To the best of my knowledge, Label Studio does not have such features.
I think this might be a good idea. |
I haven't spent a ton of time on this, but after looking a little at hypothes.is and label studio, while they're powerful annotation tools, it does not appear to me that they're well suited for thematic analysis, at least in the context of the intended workflow of LaMa. Those tools are all about "pick out a portion of the data that contains some signal" or "classify this piece of data in some existing categories". LaMa however is focused on "we have these pieces of data, and we want to come up with a taxonomy that describes them, coming to a consensus on this taxonomy with other individuals performing the coding". It's a distinctly different type of "annotation" from my understanding of the process. So, my position here would be that not only may those tools have some deficiencies in the three primary points listed in the paper, they are also likely not suited for the task in general, so there is still an unmet need here. |
Thank you @luxaritas !
I think you are right that those tools have a different target audience, with similar features but still not identical to LaMa.
The three primary points being cost/data access and privacy/complex collaboration workflow, I don't believe label studio nor hypothes.is fail at the first two. In fact one could claim that having a commercial software, label studio could have better privacy and data access for having the source code open and having a commercial service attackers could exploit. (Digressing a bit on the main discussion, but "With commercial tools, control over the access of the research data of the storage are often unavailable" might depend on the research area, and nowadays many commercial tools are also open source. One tool I worked with recently, Arvados, is open source with a commercial support, and the data access/storage location/privacy & security are documented/provided, and certified by HIPAA. But I don't think we need to modify that 👍)
I think what other tools lack is the last item, the "complex collaboration workflow", but my first point here was that before the text had no other Open Source tools being compared, which would still require at least a sentence saying that there is no Open Source tools for doing thematic labelling as LaMa does. I believe the paper has been updated to address that cost, but IMHO it would be key to express exactly what you said above. That there are other commercial and open source tools that perform similar tasks, but they lack in handling the complexity of certain annotation workflows, or lack support to controlled ontologies/vocabularies/domains for annotations & labelling, or lack in collaborative data curation, or do not handle thematic analysis, etc. (that, without making the text very long). |
Yeah, I think that makes sense. |
hi, @luxaritas @kinow and @muctadir - Thanks for the thoughtful discussion here. I especially appreciate the comment from @kinow:
Do you all feel that the current version of the manuscript satisfies this request? Thanks again! |
Hi @luxaritas @kinow @fboehm |
@muctadir I just had a look at the Markdown source and it's looking better! I was trying to preview the PDF, but I think the bot is not updating it. I'll comment in the other issue, preview the PDF, and update this issue & the checklist after that if it's looking OK (from looking at the PDF it was looking fine to me). Cheers |
@kinow Thanks already. I was able to get he latest paper from https://github.com/muctadir/lama/actions/runs/4604583613. Is it not accessible for you? |
@muctadir I thought it would be re-generated by the bot in the pull request. The latest message in the review PR is from Feb 8 (openjournals/joss-reviews#5135 (comment)), but I can't recall if that's how it worked in the past for JOSS reviews, or if I am confusing with another pull request somewhere... will wait for @fboehm 's reply. Thanks! |
@editorialbot generate pdf |
oops. Sorry about that. I intended to comment in the review thread |
Hi,
Part of openjournals/joss-reviews#5135. I see you mentioned commercial tools in the statement of need of the JOSS paper. The first item in your list of trade offs is the cost. However, that statement of need seems to ignore the existence of other OSS tools that could be compared to LaMa.
Could you consider adding other OSS tools, please? For example:
Cheers,
-Bruno
The text was updated successfully, but these errors were encountered: