Skip to content

Iris: Add FAQ consistency check #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 60 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 57 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
1415aa7
First draft of inconsistency pipeline
cremertim Feb 26, 2025
6c53ff8
Refactored to seperate inconsistencies
cremertim Feb 26, 2025
6534b5d
Added newlines
cremertim Feb 27, 2025
fac3799
Finished draft of implementation for FAQ inconsistencies
cremertim Feb 27, 2025
a859c76
Merge branch 'iris/feature/faq/add-rewrite-consistency' from multiple…
bassner Feb 27, 2025
b8849c2
Ensure language
cremertim Mar 14, 2025
f1c1aba
Ensure proper callback
cremertim Mar 14, 2025
29e9307
Ensure proper callback
cremertim Mar 14, 2025
eb1940c
Delete .idea/.gitignore
cremertim Mar 14, 2025
6f1e775
remove whitespace
cremertim Mar 14, 2025
14de2af
Adjust course chat for presentation
cremertim Mar 14, 2025
58ec964
Merge branch 'main' into iris/feature/faq/add-rewrite-consistency
cremertim Mar 14, 2025
1dd3bce
Adjust prompt
cremertim May 9, 2025
8b3278e
Merge remote-tracking branch 'origin/iris/feature/faq/add-rewrite-con…
cremertim May 9, 2025
921e328
Merge branch 'main' into iris/feature/faq/add-rewrite-consistency
cremertim May 9, 2025
67b0797
Revert wrong import changes
cremertim May 13, 2025
df88e12
Revert wrong import changes
cremertim May 13, 2025
f003225
Revert wrong import changes
cremertim May 13, 2025
b3f6418
Fix doc
cremertim May 13, 2025
4fb9038
Changes should fix linter
cremertim May 16, 2025
72864ca
Fix imports
cremertim May 16, 2025
1d10bba
Changes should fix linter
cremertim May 17, 2025
fd1c7dd
Merge branch 'main' into iris/feature/faq/add-rewrite-consistency
cremertim May 21, 2025
62e406a
Merge branch 'main' into iris/feature/faq/add-rewrite-consistency
cremertim May 22, 2025
61865f9
changed type of prompt
cremertim May 22, 2025
cad0598
Merge remote-tracking branch 'origin/iris/feature/faq/add-rewrite-con…
cremertim May 22, 2025
7febfaa
lock consistency check result
cremertim May 22, 2025
b4e9fa5
log consistency check result
cremertim May 22, 2025
a5c1598
test stripping
cremertim May 22, 2025
801d5cc
consistency result
cremertim May 22, 2025
f32516c
consistency result fix
cremertim May 22, 2025
ea479ab
consistency result fix
cremertim May 22, 2025
2585ae9
prompt fix
cremertim May 23, 2025
db69f5c
inconsistencies fix
cremertim May 23, 2025
7b80422
adapted latest changes
cremertim May 27, 2025
de0d417
adapted latest changes
cremertim May 27, 2025
d7ef783
Fix import
cremertim May 27, 2025
60aa3ab
Merge branch 'main' into iris/feature/faq/add-rewrite-consistency
cremertim May 28, 2025
508bf87
Fix import
cremertim May 28, 2025
9ec527f
Hopefully fix variants
cremertim May 28, 2025
c681ec3
Remove import
cremertim May 30, 2025
73c06f1
Merge branch 'main' into iris/feature/faq/add-rewrite-consistency
cremertim May 30, 2025
ba21bd9
Add consistency check once more
cremertim May 30, 2025
34788c5
Merge remote-tracking branch 'origin/iris/feature/faq/add-rewrite-con…
cremertim May 30, 2025
ba38183
parse faq inconsistencies once more, reformat code
cremertim May 30, 2025
8cb86b4
remove logging
cremertim May 30, 2025
9421ae2
adjust prompt a bit
cremertim May 30, 2025
44d9be9
adjust prompt a bit
cremertim May 30, 2025
a89b6c4
adjust prompt a bit
cremertim May 30, 2025
aeaf685
adjust prompt a bit
cremertim Jun 2, 2025
137b469
Merge branch 'main' into iris/feature/faq/add-rewrite-consistency
cremertim Jun 2, 2025
26be703
adjust prompt a bit
cremertim Jun 2, 2025
fdb85b8
Remove \n
cremertim Jun 2, 2025
e05fe9f
Merge branch 'main' into iris/feature/faq/add-rewrite-consistency
cremertim Jun 2, 2025
181ba71
add patricks modification again
cremertim Jun 4, 2025
778e89f
Linter
cremertim Jun 9, 2025
5871c47
black
cremertim Jun 9, 2025
4ab9f4f
precommit
cremertim Jun 9, 2025
88f5276
Merge branch 'main' into iris/feature/faq/add-rewrite-consistency
cremertim Jun 12, 2025
8e644f4
Merge branch 'main' into iris/feature/faq/add-rewrite-consistency
cremertim Jun 23, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions iris/src/iris/domain/rewriting_pipeline_execution_dto.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@

class RewritingPipelineExecutionDTO(BaseModel):
execution: PipelineExecutionDTO
course_id: int = Field(alias="courseId")
to_be_rewritten: str = Field(alias="toBeRewritten")
4 changes: 4 additions & 0 deletions iris/src/iris/domain/status/rewriting_status_update_dto.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
from iris.domain.status.status_update_dto import StatusUpdateDTO
from typing import List


class RewritingStatusUpdateDTO(StatusUpdateDTO):
result: str = ""
suggestions: List[str] = []
inconsistencies: List[str] = []
improvement: str = ""
56 changes: 56 additions & 0 deletions iris/src/iris/pipeline/prompts/faq_consistency_prompt.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
faq_consistency_prompt = """
You are an AI assistant responsible for verifying the consistency of information.
### Task:
You have been provided with a list of FAQs and a final result. Your task is to determine whether the
final result is consistent with the given FAQs. Please compare each FAQ with the final result separately.

### Instructions:
Carefully distinguish between semantically different terms.
For example, do not treat "exam" and "make-up exam" as identical — they refer to different concepts.
Only treat content as consistent if it refers to the same concept using either the same wording or clearly
synonymous expressions within the course context. Do not assume equivalence between terms unless explicitly
stated.

Secondly, identify the language of the course. The language of the course is either german or english. You can
extract the language from the existing FAQs. Your output should be in the same language as the course language.

If you are unsure, choose english.

### Given FAQs:
{faqs}

### Final Result:
{final_result}

### Output:

Generate the following response dictionary:
"type": "consistent" or "inconsistent"
The following four entries are optional and should only be set if inconsistencies are detected.

"faqs" must be a JSON array of objects. Each entry must be a JSON dictionary with exactly the following fields:
"faq_id" (string or number)
"faq_question_title" (string)
"faq_question_answer" (string)
Do not return strings like "faq_id: 1, faq_question_title: ..., ..." — return actual JSON objects.
Assume that existing FAQs are correct, so the new final_result is inconsistent.
Include only FAQs that contradict the final_result. Do not include FAQs that are consistent with the final_result.

"message": "The provided text was rephrased, however it contains inconsistent information with existing FAQs."

-Make sure to always insert two new lines after the last character of this sentences.
The "faqs" field should contain only inconsistent FAQs with their faq_id, faq_question_title, and faq_question_answer.
Make sure to not include any additional FAQs that are consistent with the final_result.

-"suggestion": This entry is a list of strings, each string represents a suggestion to improve the final result.
- Each suggestion should focus on a different inconsistency.
- Each suggestions highlights what is the inconsistency and how it can be improved.
- Do not mention the term final result, call it provided text
- Please ensure that at no time, you have a different amount of suggestions than inconsistencies.
- Highlight how you can improve the rewritten text to be consistent with the existing FAQs.
Both should have the same amount of entries.

-"improved version": This entry should be a string that represents the improved version of the final result.

Do NOT provide any explanations or additional text.
"""
148 changes: 116 additions & 32 deletions iris/src/iris/pipeline/rewriting_pipeline.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import json
import logging
from typing import List, Literal, Optional
from typing import Literal, Optional, List, Dict

from langchain.output_parsers import PydanticOutputParser
from langchain_core.prompts import (
Expand All @@ -8,7 +9,6 @@

from iris.common.pipeline_enum import PipelineEnum
from iris.common.pyris_message import IrisMessageRole, PyrisMessage
from iris.domain import FeatureDTO
from iris.domain.data.text_message_content_dto import TextMessageContentDTO
from iris.domain.rewriting_pipeline_execution_dto import (
RewritingPipelineExecutionDTO,
Expand All @@ -17,14 +17,21 @@
CompletionArguments,
ModelVersionRequestHandler,
)
from iris.llm.external.model import LanguageModel
from iris.pipeline import Pipeline
from iris.pipeline.prompts.rewriting_prompts import (
system_prompt_faq,
system_prompt_problem_statement,
)
from iris.web.status.status_update import RewritingCallback

from ..llm.external.model import LanguageModel
from ..domain import FeatureDTO

from .prompts.faq_consistency_prompt import faq_consistency_prompt
from ..vector_database.database import VectorDatabase

from ..retrieval.faq_retrieval import FaqRetrieval

logger = logging.getLogger(__name__)


Expand All @@ -48,36 +55,11 @@ def __init__(
):
super().__init__(implementation_id="rewriting_pipeline_reference_impl")
self.callback = callback
self.db = VectorDatabase()
self.request_handler = ModelVersionRequestHandler(version="gpt-4.1")
self.tokens = []
self.variant = variant

@classmethod
def get_variants(cls, available_llms: List[LanguageModel]) -> List[FeatureDTO]:
"""
Returns available variants for the RewritingPipeline based on available LLMs.
This pipeline supports 'faq' and 'problem_statement' variants.

Args:
available_llms: List of available language models

Returns:
List of FeatureDTO objects representing available variants
"""
# We could use available_llms to determine if we have LLMs capable of handling each variant
# For now, we'll just return both variants regardless of available LLMs
return [
FeatureDTO(
id="faq",
name="FAQ Variant",
description="FAQ rewriting variant.",
),
FeatureDTO(
id="problem_statement",
name="Problem Statement Variant",
description="Problem statement rewriting variant.",
),
]
self.faq_retriever = FaqRetrieval(self.db.client)

def __call__(
self,
Expand All @@ -92,7 +74,6 @@ def __call__(
"faq": system_prompt_faq,
"problem_statement": system_prompt_problem_statement,
}
print(variant_prompts[self.variant])
prompt = variant_prompts[self.variant].format(
rewritten_text=dto.to_be_rewritten,
)
Expand All @@ -115,4 +96,107 @@ def __call__(
response = response.strip()

final_result = response
self.callback.done(final_result=final_result, tokens=self.tokens)
inconsistencies = []
improvement = ""
suggestions = []

if self.variant == "faq":
faqs = self.faq_retriever.get_faqs_from_db(
course_id=dto.course_id, search_text=response, result_limit=10
)
consistency_result = self.check_faq_consistency(faqs, final_result)
faq_type = consistency_result.get("type", "").lower()
if "inconsistent" in faq_type:
logging.warning("Detected inconsistencies in FAQ retrieval.")
inconsistencies = parse_faq_inconsistencies(
consistency_result.get("faqs", [])
)
improvement = consistency_result.get("improved version", "")
suggestions = consistency_result.get("suggestion", [])

final_result = response
self.callback.done(
final_result=final_result,
tokens=self.tokens,
inconsistencies=inconsistencies,
improvement=improvement,
suggestions=suggestions,
)

def check_faq_consistency(
self, faqs: List[dict], final_result: str
) -> Dict[str, str]:
"""
Checks the consistency of the given FAQs with the provided final_result.
Returns "consistent" if there are no inconsistencies, otherwise returns "inconsistent".

:param faqs: List of retrieved FAQs.
:param final_result: The result to compare the FAQs against.

"""
properties_list = [entry["properties"] for entry in faqs]

if not faqs:
return {"type": "consistent", "message": "No FAQs to check"}

consistency_prompt = faq_consistency_prompt.format(
faqs=properties_list, final_result=final_result
)

prompt = PyrisMessage(
sender=IrisMessageRole.SYSTEM,
contents=[TextMessageContentDTO(text_content=consistency_prompt)],
)

response = self.request_handler.chat(
[prompt], CompletionArguments(temperature=0.0), tools=None
)

self._append_tokens(response.token_usage, PipelineEnum.IRIS_REWRITING_PIPELINE)
result = response.contents[0].text_content

if result.startswith("```json"):
result = result.removeprefix("```json").removesuffix("```").strip()
elif result.startswith("```"):
result = result.removeprefix("```").removesuffix("```").strip()

data = json.loads(result)

result_dict = {}
keys_to_check = ["type", "message", "faqs", "suggestion", "improved version"]
for key in keys_to_check:
if key in data:
result_dict[key] = data[key]
return result_dict

@classmethod
def get_variants(cls, available_llms: List[LanguageModel]) -> List[FeatureDTO]:
"""
Returns available variants for the FaqIngestionPipeline based on available LLMs.

Args:
available_llms: List of available language models

Returns:
List of FeatureDTO objects representing available variants
"""
return [
FeatureDTO(
id="faq",
name="Default FAQ Variant",
description="Default FAQ rewriting variant.",
),
FeatureDTO(
id="problem_statement",
name="Default Variant",
description="Default Problem statement rewriting variant.",
),
]


def parse_faq_inconsistencies(inconsistencies: List[Dict[str, str]]) -> List[str]:
parsed_inconsistencies = [
f"FAQ ID: {entry["faq_id"]}, Title: {entry["faq_question_title"]}, Answer: {entry["faq_question_answer"]}"
for entry in inconsistencies
]
return parsed_inconsistencies
47 changes: 45 additions & 2 deletions iris/src/iris/retrieval/faq_retrieval.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@

from langsmith import traceable
from weaviate import WeaviateClient
from weaviate.collections.classes.filters import Filter

from iris.common.pipeline_enum import PipelineEnum

from ..common.pipeline_enum import PipelineEnum
from ..common.pyris_message import PyrisMessage
from ..pipeline.prompts.faq_retrieval_prompts import (
faq_retriever_initial_prompt,
Expand Down Expand Up @@ -76,3 +76,46 @@ def __call__(
for obj in response_hyde.objects
]
return merge_retrieved_chunks(basic_retrieved_faqs, hyde_retrieved_faqs)

def get_faqs_from_db(
self,
course_id: int,
search_text: str = None,
result_limit: int = 10,
hybrid_factor: float = 0.75,
) -> List[dict]:
"""
Retrieves FAQs from the database, optionally with a similarity search on question_title and question_answer.

:param course_id: ID of the course to fetch FAQs for a specific course.
:param search_text: Optional search text used for semantic search.
:param result_limit: Number of FAQs to return.
:param hybrid_factor: Weighting between vector-based and keyword-based results.
:return: List of retrieved FAQs.
"""
filter_weaviate = Filter.by_property("course_id").equal(course_id)

if search_text:
vec = self.llm_embedding.embed(search_text)

response = self.collection.query.hybrid(
query=search_text,
vector=vec,
alpha=hybrid_factor,
return_properties=self.get_schema_properties(),
limit=result_limit,
filters=filter_weaviate,
)
else:

response = self.collection.query.fetch_objects(
filters=filter_weaviate,
limit=result_limit,
return_properties=self.get_schema_properties(),
)

faqs = [
{"id": obj.uuid.int, "properties": obj.properties}
for obj in response.objects
]
return faqs
18 changes: 12 additions & 6 deletions iris/src/iris/web/status/status_update.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
import logging
from abc import ABC
from typing import List, Optional

from typing import Optional, List
import requests
from sentry_sdk import capture_exception, capture_message

from sentry_sdk import capture_exception, capture_message
from iris.common.token_usage_dto import TokenUsageDTO
from iris.domain.chat.course_chat.course_chat_status_update_dto import (
CourseChatStatusUpdateDTO,
Expand Down Expand Up @@ -116,6 +115,8 @@ def done(
tokens: Optional[List[TokenUsageDTO]] = None,
next_stage_message: Optional[str] = None,
start_next_stage: bool = True,
inconsistencies: Optional[List[str]] = None,
improvement: Optional[str] = None,
):
"""
Transition the current stage to DONE and update the status.
Expand All @@ -128,6 +129,11 @@ def done(
self.status.tokens = tokens or self.status.tokens
if hasattr(self.status, "suggestions"):
self.status.suggestions = suggestions

if hasattr(self.status, "inconsistencies"):
self.status.inconsistencies = inconsistencies
if hasattr(self.status, "improvement"):
self.status.improvement = improvement
next_stage = self.get_next_stage()
if next_stage is not None:
self.stage = next_stage
Expand All @@ -139,6 +145,8 @@ def done(
self.status.result = None
if hasattr(self.status, "suggestions"):
self.status.suggestions = None
if hasattr(self.status, "inconsistencies"):
self.status.inconsistencies = None

def error(
self,
Expand Down Expand Up @@ -240,9 +248,7 @@ def __init__(
name="Checking available information",
),
StageDTO(
weight=10,
state=StageStateEnum.NOT_STARTED,
name="Creating suggestions",
weight=10, state=StageStateEnum.NOT_STARTED, name="Creating suggestions"
),
]
status = ExerciseChatStatusUpdateDTO(stages=stages)
Expand Down
Loading