Updated doc for 0.1.18 version (#506)

dristysrivastava · dristy.cd · web-flow · commit 260c61e93cee · 2024-08-28T13:39:53.000+05:30
* Updated doc for 0.1.18 version

---------

Co-authored-by: dristy.cd &lt;dristy@clouddefense.io&gt;
diff --git a/docs/gh_pages/docs/entityclassifier.md b/docs/gh_pages/docs/entityclassifier.md
@@ -18,6 +18,7 @@ Below is the list of `entities` supported by Pebblo -
 1. US Bank Account Number
 1. IBAN Code
 1. US ITIN
+1. IP Address
 1. GitHub Access Token
 1. Slack Access Token
 1. AWS Access Key
diff --git a/docs/gh_pages/docs/pebblo_ui.md b/docs/gh_pages/docs/pebblo_ui.md
@@ -61,4 +61,4 @@ Load History provides details about latest 5 loads of this app. It provides the
 
    It will also provide you with a list of these Datasource, accompanied by additional details such as the size, source path, the count of topics & entities across the datasource.
 
-4. **Snippets**: This sections provides the actual text inspected by the Pebblo Server using the Pebblo Topic Classifier and Pebblo Entity Classifier. This will be useful to quickly inspect and remediate text that should not be ingested into the Gen-AI RAG application. Each snippet shows the exact file the snippet is loaded from easy remediation.
+4. **Snippets**: This section details the text analyzed by the Pebblo Server using the Pebblo Topic Classifier and Pebblo Entity Classifier. It is designed to help quickly inspect and remediate text that should not be ingested into the Gen-AI RAG application. Each snippet shows the exact file for easy reference, with sensitive information labeled with confidence scores: HIGH, MEDIUM, or LOW.
diff --git a/docs/gh_pages/docs/safe_loader.md b/docs/gh_pages/docs/safe_loader.md
@@ -65,4 +65,4 @@ Load History provides details about latest 5 loads of this app. It provides the
 
    It will also provide you with a list of these Datasource, accompanied by additional details such as the size, source path, the count of topics & entities across the datasource.
 
-4. **Snippets**: This sections provides the actual text inspected by the Pebblo Server using the Pebblo Topic Classifier and Pebblo Entity Classifier. This will be useful to quickly inspect and remediate text that should not be ingested into the Gen-AI RAG application. Each snippet shows the exact file the snippet is loaded from easy remediation.
+4. **Snippets**: This section details the text analyzed by the Pebblo Server using the Pebblo Topic Classifier and Pebblo Entity Classifier. It is designed to help quickly inspect and remediate text that should not be ingested into the Gen-AI RAG application. Each snippet shows the exact file for easy reference, with sensitive information labeled with confidence scores: HIGH, MEDIUM, or LOW.
diff --git a/docs/gh_pages/docs/topicclassifier.md b/docs/gh_pages/docs/topicclassifier.md
@@ -8,21 +8,22 @@ and improvements to enrich its accuracy and effectiveness.
 
 Below is the list of `topics` supported by Pebblo -
 
+1. Medical Advice
+1. Harmful Advice
 1. Board Meeting
-1. Enterprise Agreement
-1. Patent Application Filling
-1. Financial Report
-1. Loan and Security Agreement
 1. Consulting Agreement
-1. Sexual Harassment
-1. Settlement Agreement
-1. Price List
-1. Distribution/Partner Agreement
 1. Customer List
+1. Enterprise Agreement
 1. Executive Severance Agreement
-1. Employee Agreement
+1. Financial Report
+1. Loan And Security Agreement
 1. Merger Agreement
-1. Non-Disclosure Agreement
+1. Patent Application Fillings
+1. Price List
+1. Employee Agreement
+1. Sexual Content
+1. Sexual Incident Report
+1. Internal Product Roadmap Agreement
 
 User can get details of classified topics for their loader source files in Pebblo report.  
 Different sections of Pebblo report such as , `Top Files With Most Findings`, `Data Source Findings Table` and `Snippets` helps to get overview of pebblo topic classifier output for user's rag application.
diff --git a/pebblo/entity_classifier/README.md b/pebblo/entity_classifier/README.md
@@ -10,6 +10,7 @@ Currently, we are supporting following Entities:
 5. US Bank Account Number
 6. IBAN code
 7. US ITIN
+8. IP Address
 
 And following Secret Entities:
 1. Github Token
@@ -28,4 +29,5 @@ entities, total_count, anonymized_text, entity_details = entity_classifier_obj.p
 print(f"Entity Group: {entity_groups}")
 print(f"Entity Count: {total_entity_count}")
 print(f"Anonymized Text: {anonymized_text}")
+print(f"Entity Details: {entity_details}")
 ```
diff --git a/pebblo/topic_classifier/README.md b/pebblo/topic_classifier/README.md
@@ -2,27 +2,22 @@
 
 This is Topic Classifier. 
 Currently, we are supporting following Topics:
-1. Normal Advice
-2. Medical Advice
-3. Harmful Advice
-4. Board Meeting
-5. Consulting Agreement
-6. Customer List
-7. Distribution/Partner Agreement
-8. Enterprise License Agreement
-9. Executive Severance Agreement
-10. Financial Report
-11. Internal Use Only
-12. Loan And Security Agreement
-13. Merger Agreement
-14. NDA
-15. Patent Application Fillings
-16. Price List
-17. Settlement Agreement
-18. Employee Agreement
-19. Enterprise Agreement
-20. Sexual Content
-21. Sexual Incident Report
+1. Medical Advice
+1. Harmful Advice
+1. Board Meeting
+1. Consulting Agreement
+1. Customer List
+1. Enterprise Agreement
+1. Executive Severance Agreement
+1. Financial Report
+1. Loan And Security Agreement
+1. Merger Agreement
+1. Patent Application Fillings
+1. Price List
+1. Employee Agreement
+1. Sexual Content
+1. Sexual Incident Report
+1. Internal Product Roadmap Agreement
     
 ## How to use
 
@@ -34,4 +29,5 @@ topic_classifier_obj = TopicClassifier()
 topics, total_topic_count, topic_details = topic_classifier_obj.predict(text)
 print(f"Topic Response: {topics}")
 print(f"Topic Count: {total_topic_count}")
+print(f"Topic Details: {topic_details}")
 ```

Original file line number	Diff line number	Diff line change
`@@ -61,4 +61,4 @@ Load History provides details about latest 5 loads of this app. It provides the`
`61`	`61`
`62`	`62`	`It will also provide you with a list of these Datasource, accompanied by additional details such as the size, source path, the count of topics & entities across the datasource.`
`63`	`63`
`64`		`-4. Snippets: This sections provides the actual text inspected by the Pebblo Server using the Pebblo Topic Classifier and Pebblo Entity Classifier. This will be useful to quickly inspect and remediate text that should not be ingested into the Gen-AI RAG application. Each snippet shows the exact file the snippet is loaded from easy remediation.`
	`64`	`+4. Snippets: This section details the text analyzed by the Pebblo Server using the Pebblo Topic Classifier and Pebblo Entity Classifier. It is designed to help quickly inspect and remediate text that should not be ingested into the Gen-AI RAG application. Each snippet shows the exact file for easy reference, with sensitive information labeled with confidence scores: HIGH, MEDIUM, or LOW.`
Original file line number	Diff line number	Diff line change
`@@ -65,4 +65,4 @@ Load History provides details about latest 5 loads of this app. It provides the`
`65`	`65`
`66`	`66`	`It will also provide you with a list of these Datasource, accompanied by additional details such as the size, source path, the count of topics & entities across the datasource.`
`67`	`67`
`68`		`-4. Snippets: This sections provides the actual text inspected by the Pebblo Server using the Pebblo Topic Classifier and Pebblo Entity Classifier. This will be useful to quickly inspect and remediate text that should not be ingested into the Gen-AI RAG application. Each snippet shows the exact file the snippet is loaded from easy remediation.`
	`68`	`+4. Snippets: This section details the text analyzed by the Pebblo Server using the Pebblo Topic Classifier and Pebblo Entity Classifier. It is designed to help quickly inspect and remediate text that should not be ingested into the Gen-AI RAG application. Each snippet shows the exact file for easy reference, with sensitive information labeled with confidence scores: HIGH, MEDIUM, or LOW.`