Skip to content

Incorrect Labels #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
michaelcalvinwood opened this issue Apr 11, 2024 · 4 comments
Open

Incorrect Labels #2

michaelcalvinwood opened this issue Apr 11, 2024 · 4 comments

Comments

@michaelcalvinwood
Copy link

michaelcalvinwood commented Apr 11, 2024

First, thank you for the effort put into RAGTruth. There is a tremendous need for such a dataset.

Unfortunately, some of the labels are sorely inaccurate. Consider Response ID 11898 as one example. This response states three supposed hallucinations, all with implicit_true being false.

Consider the first:

  • Stated Hallucination: "Cons include potentially earning less than those with graduate degrees."
  • Annotator Explanation: "Passages have no mention of this earning less than those with graduate degrees."
  • Supporting Text in Passage: "graduates who are able to find work end up making a lot more than their undergraduate counterparts"

In other words, the provided passage does state that there is a potential for those with graduate degrees to earn more than their undergraduate counterparts; which means that there is a potential for undergrads to earn less than those with graduate degrees. Hence, the annotation is incorrect.

Consider the second:

  • Stated Hallucination: "earning a higher income upon graduation"
  • Annotator Explanation: "Passages have no mention of this detail."
  • Supporting Text in Passage: "the graduates who are able to find work end up making a lot more than their undergraduate counterparts; the median annual salary plus bonus for a person fresh out of grad school with an MBA is $105,000"

Yet, "fresh out of grad school" is equivalent to "upon graduation." And the whole context is "earning a higher income" ("making a lot more than their undergraduate counterparts"). Hence, the annotation is incorrect.

Finally, consider the third:

  • Stated Hallucination: "gaining practical experience"
  • Annotator Explanation: "Passages have no mention of this tip."
  • Supporting Text in Passage: None

Hence, this annotation is correct.

Naturally, the value of the dataset is directly proportional to the correctness of the annotations. While I recognize the immense effort that has gone into this dataset, there's still a need for additional annotators to fix errant labels (and there are a lot of errant labels).

Kindly consider fixing the errant labels to make RAGTruth the incredible resource that it can be.

@sgfuiwshlkahr
Copy link

Hello Michael, thank you for your detailed review. We acknowledge that the examples you pointed out did not meet the expected standards of accuracy, and we appreciate that you brought them to our attention. We want to highlight that we are committed to the quality of the dataset and the version presented was the outcome developed through multiple rounds of review. Due to the size, it was challenging in maintaining uniform accuracy among all annotators across all annotations. However, we will be conducting another round of thorough review, aiming to have the dataset reflect its true intent and utility in supporting the value of our research.

@michaelcalvinwood
Copy link
Author

michaelcalvinwood commented Apr 24, 2024

Thank you.

Now that I know that you are committed to this dataset, I'll gladly add examples here when I come across them in order to help out.

There truly is a great need for an accurate hallucination corpus. :-)

@ogencoglu
Copy link

Hello Michael, thank you for your detailed review. We acknowledge that the examples you pointed out did not meet the expected standards of accuracy, and we appreciate that you brought them to our attention. We want to highlight that we are committed to the quality of the dataset and the version presented was the outcome developed through multiple rounds of review. Due to the size, it was challenging in maintaining uniform accuracy among all annotators across all annotations. However, we will be conducting another round of thorough review, aiming to have the dataset reflect its true intent and utility in supporting the value of our research.

Sounds LLM-generated to be honest.

Any updates?

@jwmueller
Copy link

I'd also like to thank you for putting together RAGTruth. It has great potential for benchmarking RAG & hallucination-detection techniques!

Unfortunately I'm also unable to rely on the current dataset for benchmarking due to too many label errors, so I agree that another round of annotation would be extremely valuable.

Below are some misc label errors beyond those listed by @michaelcalvinwood above (I observed tons more label errors when glancing through preliminary results). Each of these examples is labeled as "Correct Answer" in the dataset, even though these LLM answers are clearly wrong.

Example 1

Question: how to get rid of ticks from dogs
Answer: Unable to answer based on given passages.

Passages (only listing relevant part):
passage 1: Be sure to check inside and behind his ears and around his eyes, all favorite tick hiding places. To remove a tick, follow these steps: Use a pair of tweezers to grasp the head of the tick where it attaches to the skin. Wear gloves if you plan to use your fingers to remove the tick.Pull on the tick gently and steadily. If you yank the tick away from your dog too quickly, you’ll leave part of the tick’s mouth behind, which can cause an infection.

Example 2

Question: which is better exercise cycling or aerobic
Answer: Based on the given passages, both exercise cycling and aerobic exercise are beneficial for overall fitness.

Passages (only listing relevant part):
passage 1:Unless you are a competitive strongman, aerobic exercise, like running or cycling, is good for you. You should look at a wide variety of physical endeavors as challenges to be accepted and find your fitness Zen. passage 2:Cycling. Cycling is another type of aerobic exercise with wide appeal and value. You can use a stationary or regular bike. Cycling may be ideal for individuals who, due to arthritic or other orthopedic problems, are unable to walk for an extended period of time without pain or difficulty. passage 3:The indoor exercise, or stationary, bike provides you with the lowest impact workout compared to the treadmill and rowing machine, according to WebMD. This is because the bike supports the majority of your weight.

Example 3

Question: how to fold a quilt
Answer: Unable to answer based on given passages.

Passages (only listing relevant part):
passage 2:The Timepiece Quilt used in this tutorial measures 56 inches by 70 inches, but this method can be used on any size quilt. 1 Step 1: Lay your quilt on a bed, table or other flat surface. 2 Step 2: Fold the lower right corner up toward the center of the quilt making sure the fold is on the bias.
passage 3:Step 3: Fold the lower left corner up toward the center of the quilt making sure the fold is on the bias. Step 4: Fold the top left corner down toward the center of the quilt making sure the fold is on the bias. Step 5: Making sure the fold is on the bias, fold the top right corner down toward the center of the quilt.

Example 4

Question: what is dmso used for
Answer: Based on the provided passages, DMSO (dimethyl sulfoxide) is used for various purposes including: 1. As an antifreeze in industrial applications. 2. As a solvent for a wide range of chemicals in industrial settings. 3. In medicine, DMSO relieves pain, reduces swelling, inflammation, and has been used to treat a variety of medical conditions such as burns, scleroderma, and minor cuts and burns. 4. Promoting healing in cases of soft tissue damage, local tissue death, skin ulcers, and burns. Therefore, the answer to the question "What is DMSO used for?" is: 1. Industrial applications (antifreeze and solvent). 2. Medical applications (pain relief, inflammation reduction, and treatment of various medical conditions). 3. Promoting healing in cases of soft tissue damage, local tissue death, skin ulcers, and burns. Unable to answer based on the given passages: Providing detailed information on the specific medical conditions or injuries that DMSO has been used to treat, as this information is not explicitly stated in the provided passages.

Passages (only listing relevant part):
passage 1:DMSO is dimethyl sulfoxide, a by-product in converting trees to paper. It has industrial uses as an antifreeze and as a solvent for a wide range of chemicals. There are also a wide range of medical uses. It has a remarkable capacity to penetrate skin and tissue and enter the bloodstream. passage 2:DMSO relieves pain, reduces swelling, inflammation and much more. See 30 DMSO uses, from burns to scleroderma, plus directions, side effects, safety. DMSO pain relief is most frequently for muscles and joints. Applied to the skin, DMSO is an excellent anti-inflammatory and has many uses as a general pain reliever. It has been used with varying degrees of success in a great variety of medical ailments. Possible applications include: passage 3:DMSO as long been used to promote healing. People who have it on hand often use it for minor cuts and burns and report that recovery is speedy. Several studies have documented DMSO use with soft tissue damage, local tissue death, skin ulcers, and burns.18-21.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants