Despite the existence of more than 7000 languages in the world, a small group of just 10 languages dominate the majority of internet usage, accounting for over 70% of global internet users according to Statista one of the leading providers of market and consumer data in the world.
This leaves a significant challenge for the development of conversational AI tools for the remaining over 6900 languages, which collectively account for less than 30% of internet usage and are considered low-resource languages.
Africa is home to more than 2000 languages, a third of the world's spoken language according to The African Language Program at Harvard.
This means that individuals in these countries may be even less likely to have access to accurate and relevant healthcare information online.
Objective of this project
This project aims to develop a Kiswahili ASR (Automatic Speech Recognition) model to contribute in solving patient-doctor consultations (conversations) documentation.
-
Africa has the highest burden of disease according to a report in 2019.
-
Most of the healthcare system in Africa is often overwhelmed and underfunded.
Conversational AI tools can be used for tasks such as:
- Symptom checking
- Disease diagnosis
- Treatment recommendations
- Robust documentation
The negative health outcomes of a lack of representation of all languages in the digital space include:
- A lack of accurate diagnosis
- Inadequate treatment
- Missing out on important medical information
We used an open source swahili dataset from Common Voice website that is available on hugging face dataset hub
We hosted the model on Hugging Face Hub. You can upload a swahili clip from your files, or record from the browser to get a transcription. (You may experience some errors in the transcription. We are working to make the model smarter)
- Notebooks on this repo:
-
Audio pre-processing notebook and EDA are series of functions for converting a raw audio to MFCCs.
-
Fine_tuning_(a_pretrained_model)_for_Swahili_ASR Pre-processing the data from hugging face.
-
Fine_tuning_XLS_R_Wav2Vec2_with_Swahili_corpus_v1 is the initial attempt to train on Google colab, but unsuccessful due to limited computing resources.
-
Fine_tuning_XLS-R with swahili corpus is version 1 of fine-tuning and storing the checkpoints on drive.
-
Fine_tuning_XLS-R with swahili corpus is version 2 of fine-tuning and pushed our model checkpoints to Hugging Face Hub.
-
Real_Time_Speech_Recognition_on_Gradio is version 1 attempt to host our model on gradio.
- write-ups that accompany this work:
-
A narrative on literature review
-
A second narrative on data preparation
-
A third narrative on model development
-
The overall technical report