Skip to content

Adding documentation for Pebblo Topic and Entity Classifier #186

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 20, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions docs/gh_pages/development.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ brew install pango
sudo apt-get install libpango-1.0-0 libpangoft2-1.0-0
```


> Note <sup>1</sup>: _The Pebblo Daemon supports Python versions 3.9 and above.
Ensure compatibility by using Python 3.9 or later versions for seamless integration and optimal performance._

## Build, Install and Run

Fork and clone the pebblo repo. From within the pebblo directory, create a python virtual-env, build pebblo package (in `wheel` format), install and run.
Expand Down
31 changes: 31 additions & 0 deletions docs/gh_pages/entityclassifier.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Pebblo Entity Classifier

`Pebblo entity classifier` is designed to automatically scan your loader source files and pinpoint sensitive entities within the files. By highlighting these entities, it assists in ensuring compliance, data security, and privacy protection within your data processing pipeline.
Integrating it enhances risk mitigation and regulatory adherence while streamlining sensitive data handling.

Pebblo Entity Classifier harnesses the power of the `Presidio Analyzer` python library for accurate entity classification.
Leveraging Presidio's robust features and capabilities, we ensure precise identification of entities within textual data.
Additionally, our solution welcomes contributions from the open-source community, encouraging collaborative efforts to improve its functionality and reliability.

# Entities Supported By Pebblo Entity Classifier

Below is the list of `entities` supported by Pebblo -

1. US Social Security Number
2. US Passport Number
3. US Driver's License
4. US Credit Card Number
5. US Bank Account Number
6. IBAN Code
7. US ITIN
8. Github Access Token
9. Slack Access Token
10. AWS Access Key
11. AWS Secret Key
12. Azure Key ID


User can get details of classified entities for their loader source files in Pebblo report.
Different sections of Pebblo report such as , `Top Files with Most Findings`, `Data Source Findings Table` and `Snippets` helps to get overview of pebblo entity classifier output for user's Rag application.

For more details refer - [Reports](reports.md)
4 changes: 4 additions & 0 deletions docs/gh_pages/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,10 @@ brew install pango
sudo apt-get install libpango-1.0-0 libpangoft2-1.0-0
```


> Note <sup>1</sup>: _The Pebblo Daemon supports Python versions 3.9 and above.
Ensure compatibility by using Python 3.9 or later versions for seamless integration and optimal performance._

### Pebblo Daemon

```
Expand Down
10 changes: 10 additions & 0 deletions docs/gh_pages/sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,16 @@ const sidebars: SidebarsConfig = {
id: "troubleshooting", // document ID
label: "Troubleshooting Guide", // sidebar label
},
{
type: "doc",
id: "entityclassifier", // document ID
label: "Pebblo Entity Classifier", // sidebar label
},
{
type: "doc",
id: "topicclassifier", // document ID
label: "Pebblo Topic Classifier", // sidebar label
},
],
};

Expand Down
32 changes: 32 additions & 0 deletions docs/gh_pages/topicclassifier.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@

# Pebblo Topic Classifier

`Pebblo topic classifier` is designed to analyze loader source files and accurately identify the underlying
topics they contain. It uses machine learning model meticulously trained to identify and categorize topics within textual data. this model is open for contribution from the open-source community, allowing for collaborative enhancements
and improvements to enrich its accuracy and effectiveness.

# Topics Supported By Pebblo Topic Classifier

Below is the list of `topics` supported by Pebblo -

1. Board Meeting
2. Enterprise Agreement
3. Patent Application Filling
4. Financial Report
5. Loan and Security Agreement
6. Consulting Agreement
7. Sexual Harassment
8. Settlement Agreement
9. Price List
10. Distribution/Partner Agreement
11. Customer List
12. Executive Severance Agreement
13. Employee Agreement
14. Merger Agreement
15. Non-Disclosure Agreement


User can get details of classified topics for their loader source files in Pebblo report.
Different sections of Pebblo report such as , `Top Files with Most Findings`, `Data Source Findings Table` and `Snippets` helps to get overview of pebblo topic classifier output for user's rag application.

For more details refer - [Reports](reports.md)