[joss] In the statement of need, how does it compare with OSS annotation tools? #7

kinow · 2023-02-09T15:24:13Z

Hi,

Part of openjournals/joss-reviews#5135. I see you mentioned commercial tools in the statement of need of the JOSS paper. The first item in your list of trade offs is the cost. However, that statement of need seems to ignore the existence of other OSS tools that could be compared to LaMa.

Could you consider adding other OSS tools, please? For example:

Cheers,
-Bruno

muctadir · 2023-03-01T08:25:21Z

Dear @kinow
Thank you very much for your comments. As we explained in the statement of need, LaMa was developed to aid with the thematic analysis process which is a method for qualitative analysis. Although many of the tools you mentioned are about text annotation, which is a core part of thematic analysis, many of them are ML based (i.e., https://github.com/dataqa/nlp-labelling, https://github.com/BrikerMan/Kashgari). The use-case of LaMa is mostly about manual labeling. Some of the tools you mentioned are about generating/annotating dataset (https://github.com/argilla-io/argilla, https://github.com/RTIInternational/SMART) to be used for different ML algorithms, which is a very different use-case compared to what LaMa tries to solve. The commercial tools that we mentioned in the paper are extremely popular and are developed for qualitative analysis and that is why we mentioned them. At the same time, we did not want this paper to turn into a tool comparison paper and therefore, we left out other text annotation tools that we investigated. Furthermore, the word limitation for the paper contributed to presenting the most important information about the tool itself. Moreover, in the paper, we mentioned 3 points in the statement of need. One of them is indeed cost. However, all the three points equally contributed to our motivation for developing LaMa.
I hope this explains our motivation for the content of the statement of need.

kinow · 2023-03-01T09:20:22Z

Hi @muctadir !

What about https://web.hypothes.is/? This is one that I have seen added to Open Source tools to annotate text, and also used in previous companies where I worked. I think that's one of the most popular tools used to annotate text, and appears to have many overlapping features with Lama.

Thank you!
-Bruno

muctadir · 2023-03-02T10:44:19Z

Dear @kinow

I am looking in the tool that you mentioned and trying to find out the feature set that it provides. For now, I understand that it can annotate texts from various sources and share these annotation across multiple users. I am now trying to find out what other features it provides. Is there a documentation page that you know of? Furthermore, I have a feeling that this tool is about sharing knowledge by something they call "social annotation". And if that is the case, I think the use-case is still different.
Please, let me know what you think about my observation and if I am missing something.

kinow · 2023-03-05T11:37:32Z

Hi @muctadir,

I think hypothesis could do with a simpler web page, that lets users find more about its features with less clicks. I had a look and found these resources that should be useful, I hope, for you to view how to get started with it:

I think hypothesis has similar features, some interesting features that could be useful in lama (moderation, browser extension), but also it could lack features that are important in lama (like conflict resolution). This paper, for example, mentions brat and hypothesis, and explains why KAT is still important.

I think the lama paper is doing a great job explaining that there are commercial tools but the complex collaboration is simpler in lama. However, after reading the paper I am still left with the question whether there are Open Source tools that could be used instead (especially important for lama's paper, IMHO, as it's being published in the JOSS).

Hypothesis should check the boxes for Cost and Data access and privacy, but maybe the collaboration workflow doesn't match your use case? Or maybe there are other Open Source annotations tools that have the complex collaboration, but lack the data access and privacy, or tools that have everything that lama does, but are not maintained, etc. I think a short paragraph about it would be enough for the lama paper.

Cheers
Bruno

muctadir · 2023-03-06T15:03:37Z

However, after reading the paper I am still left with the question whether there are Open Source tools that could be used instead

I think a short paragraph about it would be enough for the lama paper.

I agree that this is missing. I will add a paragraph based on what we investigated initially before developing LaMa.

muctadir · 2023-03-16T06:54:43Z

In light of the current word count in the paper, I have added a sentence to refer to https://labelstud.io/ which we investigated prior to developing LaMa.

* Additional detail on prior versions #20 * add opensource alternative to resolve #7

kinow · 2023-03-17T17:22:07Z

I wasn't aware of Label Studio. Thanks for mentioning it and updating the paper. Looking at this commit, ac348f9, the text below lists the cons of the solutions (including Label Studio). The first being "Cost: As these are commercial tools", which is not correct for Label Studio? They appear to have a commercial SASS version, but the code is open source (like LaMa's code, also using a permissive license - ALv2).

The second point is about data access and privacy. Label Studio also has a page about security (https://labelstud.io/guide/security.html) but I think it a wider sense, including database access. But on their documentation you can find more about granting permissions to different users (https://labelstud.io/guide/signup.html#Invite-collaborators-to-a-project). So I think they also provide data access and privacy, and I guess it could be well tested since they are in a commercial operation.

Label Studio also seems to offer extra features like image labelling, ML assisted labelling, and other features related to the third point in the list in the paper, about complex collaboration workflow, e.g.

relations between annotations - https://labelstud.io/guide/labeling.html#Add-relations-between-annotations
label with collaborators - https://labelstud.io/guide/labeling.html#Label-with-collaborators

I have not dug into their issues & code, nor signed up for their demo, or tries running it locally. Before doing that, could you elaborate more how it was compared to LaMa, and how did your team identified that it was not sufficient to use Label Studio. Moreover, given that Cost is one of the three items raised as the motivation for LaMa, I think a single Open Source tool is not enough to drive the need for a new tool. It would be better to expand that in the paper too.

We can ping also the editor to have another opinion here, @fboehm, as well the other reviewer @luxaritas

muctadir · 2023-03-20T13:00:13Z

The first being "Cost: As these are commercial tools", which is not correct for Label Studio?

Somehow I missed to fix this text. I just updated the paper with correct text here.

To answer the reminder of the comment I would like to refer to one of my previous comment. And I would like to quote parts of that reply:

LaMa was developed to aid with the thematic analysis process which is a method for qualitative analysis. Although many of the tools you mentioned are about text annotation, which is a core part of thematic analysis, many of them are ML based.

The use-case of LaMa is mostly about manual labeling. Some of the tools you mentioned are about generating/annotating dataset (https://github.com/argilla-io/argilla, https://github.com/RTIInternational/SMART) to be used for different ML algorithms, which is a very different use-case compared to what LaMa tries to solve.

In light of these two comments I made earlier, you can already see how Label Studio has a different use-case, which is about annotating data. You also mentioned about ML assisted labeling with is not what we wanted for LaMa.

Moreover, in the paper, we mentioned 3 points in the statement of need. One of them is indeed cost. However, all the three points equally contributed to our motivation for developing LaMa.

You focused on cost in your comment and as I mentioned earlier, all the 3 points are equally important. You are indeed, to some extend, correct about the first two points. However, collaboration is also a key motivation which includes features such as collaborative labeling and conflict detection and resolution. To the best of my knowledge, Label Studio does not have such features.

We can ping also the editor to have another opinion here, @fboehm, as well the other reviewer @luxaritas

I think this might be a good idea.

luxaritas · 2023-03-27T01:00:39Z

I haven't spent a ton of time on this, but after looking a little at hypothes.is and label studio, while they're powerful annotation tools, it does not appear to me that they're well suited for thematic analysis, at least in the context of the intended workflow of LaMa. Those tools are all about "pick out a portion of the data that contains some signal" or "classify this piece of data in some existing categories". LaMa however is focused on "we have these pieces of data, and we want to come up with a taxonomy that describes them, coming to a consensus on this taxonomy with other individuals performing the coding". It's a distinctly different type of "annotation" from my understanding of the process.

So, my position here would be that not only may those tools have some deficiencies in the three primary points listed in the paper, they are also likely not suited for the task in general, so there is still an unmet need here.

kinow · 2023-03-27T12:30:52Z

Thank you @luxaritas !

it does not appear to me that they're well suited for thematic analysis, at least in the context of the intended workflow of LaMa.

I think you are right that those tools have a different target audience, with similar features but still not identical to LaMa.

So, my position here would be that not only may those tools have some deficiencies in the three primary points listed in the paper,

The three primary points being cost/data access and privacy/complex collaboration workflow, I don't believe label studio nor hypothes.is fail at the first two. In fact one could claim that having a commercial software, label studio could have better privacy and data access for having the source code open and having a commercial service attackers could exploit.

(Digressing a bit on the main discussion, but "With commercial tools, control over the access of the research data of the storage are often unavailable" might depend on the research area, and nowadays many commercial tools are also open source. One tool I worked with recently, Arvados, is open source with a commercial support, and the data access/storage location/privacy & security are documented/provided, and certified by HIPAA. But I don't think we need to modify that 👍)

they are also likely not suited for the task in general, so there is still an unmet need here.

I think what other tools lack is the last item, the "complex collaboration workflow", but my first point here was that before the text had no other Open Source tools being compared, which would still require at least a sentence saying that there is no Open Source tools for doing thematic labelling as LaMa does.

I believe the paper has been updated to address that cost, but IMHO it would be key to express exactly what you said above. That there are other commercial and open source tools that perform similar tasks, but they lack in handling the complexity of certain annotation workflows, or lack support to controlled ontologies/vocabularies/domains for annotations & labelling, or lack in collaborative data curation, or do not handle thematic analysis, etc. (that, without making the text very long).

luxaritas · 2023-03-27T14:08:40Z

Yeah, I think that makes sense.

fboehm · 2023-03-29T18:12:27Z

hi, @luxaritas @kinow and @muctadir - Thanks for the thoughtful discussion here. I especially appreciate the comment from @kinow:

I believe the paper has been updated to address that cost, but IMHO it would be key to express exactly what you said above. That there are other commercial and open source tools that perform similar tasks, but they lack in handling the complexity of certain annotation workflows, or lack support to controlled ontologies/vocabularies/domains for annotations & labelling, or lack in collaborative data curation, or do not handle thematic analysis, etc. (that, without making the text very long).

Do you all feel that the current version of the manuscript satisfies this request? Thanks again!

muctadir · 2023-04-04T06:24:12Z

Hi @luxaritas @kinow @fboehm
Thanks for your comments. I think it indeed makes sense to be explicit about the use case. I have now updated the paper to include an addition point in the statement of need to address this. To answer @fboehm, I believe, the current version of the paper satisfies the request.

kinow · 2023-04-05T10:42:08Z

@muctadir I just had a look at the Markdown source and it's looking better! I was trying to preview the PDF, but I think the bot is not updating it. I'll comment in the other issue, preview the PDF, and update this issue & the checklist after that if it's looking OK (from looking at the PDF it was looking fine to me). Cheers

muctadir · 2023-04-05T12:31:51Z

@kinow Thanks already. I was able to get he latest paper from https://github.com/muctadir/lama/actions/runs/4604583613. Is it not accessible for you?

kinow · 2023-04-05T12:35:51Z

@muctadir I thought it would be re-generated by the bot in the pull request. The latest message in the review PR is from Feb 8 (openjournals/joss-reviews#5135 (comment)), but I can't recall if that's how it worked in the past for JOSS reviews, or if I am confusing with another pull request somewhere... will wait for @fboehm 's reply. Thanks!

fboehm · 2023-04-05T16:19:59Z

@editorialbot generate pdf

fboehm · 2023-04-05T16:20:31Z

oops. Sorry about that. I intended to comment in the review thread

JarlJansen123 assigned muctadir Feb 15, 2023

kinow mentioned this issue Feb 16, 2023

[REVIEW]: LaMa: a thematic labelling web application openjournals/joss-reviews#5135

Closed

muctadir closed this as completed Mar 1, 2023

muctadir reopened this Mar 2, 2023

muctadir added a commit that referenced this issue Mar 16, 2023

add opensource alternative to resolve #7

035eb08

muctadir closed this as completed Mar 16, 2023

muctadir added a commit that referenced this issue Mar 16, 2023

Additional detail on prior versions #20 (#21)

ac348f9

* Additional detail on prior versions #20 * add opensource alternative to resolve #7

muctadir added a commit that referenced this issue Mar 20, 2023

fixing text based on comment on #7

98f22f9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[joss] In the statement of need, how does it compare with OSS annotation tools? #7

[joss] In the statement of need, how does it compare with OSS annotation tools? #7

kinow commented Feb 9, 2023

muctadir commented Mar 1, 2023

kinow commented Mar 1, 2023 •

edited

Loading

muctadir commented Mar 2, 2023

kinow commented Mar 5, 2023

muctadir commented Mar 6, 2023

muctadir commented Mar 16, 2023

kinow commented Mar 17, 2023

muctadir commented Mar 20, 2023

luxaritas commented Mar 27, 2023

kinow commented Mar 27, 2023 •

edited

Loading

luxaritas commented Mar 27, 2023

fboehm commented Mar 29, 2023

muctadir commented Apr 4, 2023

kinow commented Apr 5, 2023

muctadir commented Apr 5, 2023

kinow commented Apr 5, 2023

fboehm commented Apr 5, 2023

fboehm commented Apr 5, 2023

[joss] In the statement of need, how does it compare with OSS annotation tools? #7

[joss] In the statement of need, how does it compare with OSS annotation tools? #7

Comments

kinow commented Feb 9, 2023

muctadir commented Mar 1, 2023

kinow commented Mar 1, 2023 • edited Loading

muctadir commented Mar 2, 2023

kinow commented Mar 5, 2023

muctadir commented Mar 6, 2023

muctadir commented Mar 16, 2023

kinow commented Mar 17, 2023

muctadir commented Mar 20, 2023

luxaritas commented Mar 27, 2023

kinow commented Mar 27, 2023 • edited Loading

luxaritas commented Mar 27, 2023

fboehm commented Mar 29, 2023

muctadir commented Apr 4, 2023

kinow commented Apr 5, 2023

muctadir commented Apr 5, 2023

kinow commented Apr 5, 2023

fboehm commented Apr 5, 2023

fboehm commented Apr 5, 2023

kinow commented Mar 1, 2023 •

edited

Loading

kinow commented Mar 27, 2023 •

edited

Loading