Skip to content

Add Readme and code for Document localisation, OCR, SLID, Text analysis #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
86 changes: 86 additions & 0 deletions python/document-localization/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,86 @@
# Reverie Language Translation API Example

This is a simple Python script demonstrating how to use the Reverie Language Translation API to upload, translate, and download documents.

## Prerequisites

* **Python 3.6+:** Ensure you have Python installed on your system.
* **Required Libraries:** Install the necessary libraries using pip:

```bash
pip install requests
```

* **Reverie API Credentials:** You will need a Reverie API key, application ID, and application name. Obtain these from your Reverie developer account.

## Setup

1. **Clone or Download:** Clone this repository or download the Python script (`your_script_name.py`).
2. **API Credentials:** Replace the placeholder API credentials in the script with your actual credentials:

```python
REV_APP_ID = "your_app_id" # Replace with your Reverie App ID
REV_APPNAME = "your_app_name" # Replace with your Reverie App Name
REV_API_KEY = "your_api_key" # Replace with your Reverie API Key
```

3. **Input File:** Place the document you want to translate (e.g., a `.pdf` or `.docx` file) in the same directory as the script. Change the `file_path` variable in the script to the name of your file.

```python
file_path = 'your_document.pdf' # Replace with your file name
```

4. **Source and Target Languages:** Specify the source and target languages. The supported languages are:

* english
* bengali
* gujarati
* hindi
* kannada
* malayalam
* odia
* punjabi
* sanskrit
* tamil
* telugu

Change the `source_lang` and `target_lang` variables in the script:

```python
source_lang = 'english' # Source language (e.g., English)
target_lang = 'hindi' # Target language (e.g., Hindi)
```

## Running the Script

1. **Open a Terminal or Command Prompt:** Navigate to the directory containing the script.
2. **Execute the Script:** Run the Python script:

```bash
python your_script_name.py
```

3. **Output:** The script will:
* Upload the file to the Reverie API.
* Check the translation status periodically.
* Download the translated document to the `output` directory.
* Print status messages to the console.

4. **Translated Document:** The translated document will be saved in the `output` directory as `translated_document_{target_lang}.docx`.

## Example Usage

To translate a file named `my_document.pdf` from English to Tamil:

1. Place `my_document.pdf` in the same directory as the script.
2. Set `file_path = 'my_document.pdf'`, `source_lang = 'en'`, and `target_lang = 'ta'`.
3. Run the script: `python your_script_name.py`.
4. The translated document will be saved as `output/translated_document_ta.docx`.

## Important Notes

* Ensure you have a stable internet connection.
* The translation time depends on the size of the document and the Reverie API's processing load.
* If you encounter any errors, check the console output for error messages.
* Ensure your API keys are correct.
* This example is for educational purpose, for production environment, please handle exceptions and errors properly.
107 changes: 107 additions & 0 deletions python/document-localization/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
import os
import time
import requests

REV_APP_ID = "<YOUR-APP-ID>"
REV_APPNAME = "nmt"
REV_API_KEY = "<YOUR-API-KEY>"

UPLOAD_URL = "https://revapi.reverieinc.com/translate_doc_import"
STATUS_URL = "https://revapi.reverieinc.com/translate_doc_status"
DOWNLOAD_URL = "https://revapi.reverieinc.com/translate_doc_export"

OUTPUT_DIR = "output"
os.makedirs(OUTPUT_DIR, exist_ok=True)

def upload_file(file_path, source_lang, target_lang):
"""Uploads a document for translation."""
with open(file_path, "rb") as file:
files = {"projectFiles": file}
data = {"sourceLanguage": source_lang, "targetLanguage": target_lang}
headers = {
"REV-APP-ID": REV_APP_ID,
"REV-APPNAME": REV_APPNAME,
"REV-API-KEY": REV_API_KEY,
}

response = requests.post(UPLOAD_URL, headers=headers, files=files, data=data)
if response.status_code == 200:
json_response = response.json()
print("File uploaded successfully.")
return json_response.get("projectId")
else:
print("Error in uploading file:", response.text)
return None

def check_status(doc_id):
"""Checks the translation status."""
while True:
params = {"doc_id": doc_id}
headers = {
"REV-APP-ID": REV_APP_ID,
"REV-APPNAME": REV_APPNAME,
"REV-API-KEY": REV_API_KEY,
}

response = requests.get(STATUS_URL, headers=headers, params=params)
json_response = response.json()

if json_response.get("success") and json_response.get("message") == "completed":
print("Translation completed!")
return True
else:
print("Translation in progress... Checking again in 2 seconds.")
time.sleep(2)

def download_translation(doc_id, target_lang):
"""Downloads the translated document and saves it to the output folder."""
headers = {
"Content-Type": "application/json",
"REV-APP-ID": REV_APP_ID,
"REV-APPNAME": REV_APPNAME,
"REV-API-KEY": REV_API_KEY,
}

json_data = {"unitId": doc_id, "targetLanguages": [target_lang]}
response = requests.post(DOWNLOAD_URL, headers=headers, json=json_data)

if response.status_code == 200:
json_response = response.json()
if json_response.get("success"):
target_urls = json_response.get("data", {}).get("targetURLS", {})
for filename, lang_urls in target_urls.items():
file_url = lang_urls.get(target_lang)
if file_url:
download_and_save(file_url, filename)
print('File downloaded successfully')
else:
print(f"Target language '{target_lang}' not found for file '{filename}'.")
else:
print("Error downloading translation:", response.text)


def download_and_save(file_url, filename):
"""Downloads the file from the URL and saves it."""
response = requests.get(file_url, stream=True)

if response.status_code == 200:
output_path = os.path.join(OUTPUT_DIR, f"translated_{filename}")
with open(output_path, 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
print(f"Saved translated file to: {output_path}")
else:
print("Failed to download translated document.")


if __name__ == "__main__":
file_path = '<YOUR-FILE-PATH>'
source_lang = '<SOURCE-LANGUAGE>'
target_lang = '<TARGET-LANGUAGE>'

doc_id = upload_file(file_path, source_lang, target_lang)

if doc_id:
if check_status(doc_id):
download_translation(doc_id, target_lang)
47 changes: 47 additions & 0 deletions python/language-identification-voice/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# REVERIE LANGUAGE TECHNOLOGIES SLID API Client

This Python script provides a client for interacting with the Spoken Language Identification (SLID) API. It allows you to upload an audio file and identify the spoken language.

## Prerequisites

* Python 3.6 or higher
* `requests` library (Install using: `pip install requests`)

## Installation

1. Clone this repository or download the `slid_api_client.py` file.

## Usage

1. **Obtain API Credentials:** You need to have a valid API Key and App ID provided by Reverie.
2. **Install Dependencies:** If you haven't already, install the necessary Python library:
```bash
pip install requests
```
3. **Configure the Script:**
* Open `slid_api_client.py` in a text editor.
* Replace `<Your API Key>` and `<Your App ID>` with your actual API credentials.
* Modify the `audio_file` variable to point to the path of the audio file you want to analyze.
* If your audio file is not in the default format (WAV, Signed 16 bit, 16,000 Hz), you can specify the format using the `audio_format` parameter in the `identify_language` function.
4. **Run the Script:**
```bash
python slid_api_client.py
```
5. **View the Output:** The script will print the JSON response from the API, which includes the detected language and confidence score.

## Code Details

* The `SLIDApiClient` class encapsulates the API interaction.
* The `identify_language` method sends a POST request to the SLID API endpoint with the audio file and necessary headers.
* Error handling is implemented using `try-except` blocks to catch potential exceptions during the API request.
* The script demonstrates how to use the client with both the default audio format and a specific format.

## Supported Audio Formats

The API supports various audio formats, including:

* WAV (default): Signed 16 bit, 16,000 Hz (16k\_int16), Unsigned 8 bit, 16,000 Hz (16k\_uint8), Signed 16 bit, 8,000 Hz (8k\_int16), Unsigned 8 bit, 8,000 Hz (8k\_uint8)
* MP3
* FLAC
* OGG Vorbis
* OGG Opus
65 changes: 65 additions & 0 deletions python/language-identification-voice/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
import requests
import os

class SLIDApiClient:
def __init__(self, api_key, app_id):
"""
Initializes the SLID API client.

Args:
api_key (str): The API key provided by Reverie.
app_id (str): The App ID provided by Reverie.
"""
self.api_key = api_key
self.app_id = app_id
self.base_url = "https://revapi.reverieinc.com/upload"

def identify_language(self, audio_file_path, audio_format=None):
"""
Identifies the language spoken in the given audio file.

Args:
audio_file_path (str): Path to the audio file.
audio_format (str, optional): Format of the audio file. Defaults to None.

Returns:
dict: The JSON response from the API.
"""
headers = {
"REV-API-KEY": self.api_key,
"REV-APP-ID": self.app_id,
"REV-APPNAME": "slid",
}
files = {"audio_file": open(audio_file_path, "rb")}
data = {}

if audio_format:
headers["format"] = audio_format

try:
response = requests.post(self.base_url, headers=headers, files=files)
response.raise_for_status()
return response.json()
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
return None
finally:
files["audio_file"].close()

if __name__ == "__main__":
api_key = "<YOUR-API-KEY>"
app_id = "<YOUR-APP-ID>"

client = SLIDApiClient(api_key, app_id)
audio_file = "<PATH-TO-YOUR-FILE>"

result = client.identify_language(audio_file)
if result:
print("Language Identification Result:")
print(result)

# Example usage with a specific audio format (e.g., mp3)
# result_mp3 = client.identify_language(audio_file, audio_format="mp3")
# if result_mp3:
# print("Language Identification Result (MP3):")
# print(result_mp3)
Loading