This project is an interactive desktop application that allows users to record audio in real-time, save the recordings, and transcribe the audio to text using the Whisper AI model. The user-friendly interface is built with Tkinter, making it easy to start and stop recordings and generate transcriptions with just a few clicks.
This Python-based GUI application allows you to record audio, save it as a .wav
file, and transcribe it into text using OpenAI's Whisper model. Built with tkinter
, it features buttons to start/stop recording and transcribe audio, saving the recording as output.wav
and the transcription as transcription.txt
.
-
Real-Time Audio Recording: Users can start and stop audio recordings with the click of a button. Audio is captured using PyAudio, ensuring high-quality recordings.
-
Audio File Management: Recorded audio is saved in WAV format for compatibility and quality retention. Automatic file saving upon stopping the recording.
-
AI-Powered Transcription: Uses Whisper, an advanced speech-to-text model, to transcribe recorded audio. Transcription results are saved to a text file for easy access and further use.
-
User-Friendly Interface: Built with Tkinter, the application provides a simple, clean, and responsive interface. Buttons are styled for ease of use and accessibility.
-
Fork this repository: Fork the
CowTheGreat/Real-Time-Audio-Recorder-and-Transcriber-using-Whisper-AI
repository. Follow these instructions on how to fork a repository -
Clone the project:
git clone [email protected]:your-username/Real-Time-Audio-Recorder-and-Transcriber-using-Whisper-AI
-
Required downloads:
cd Real-Time-Audio-Recorder-and-Transcriber-using-Whisper-AI
pip install -r requirements.txt
- Alternative ways to download the dependencies:
- To run, install Python 3.8+ and required packages (
pyaudio
andwhisper
) using
pip install pyaudio whisper
-
Whisper downloads its models on the first run; ensure an active internet connection. Customize the Whisper model by modifying
whisper.load_model("small")
in the code (options: tiny, base, small, medium, large). -
For more on Whisper, visit https://github.com/openai/whisper.
- Running the project: Run the application with :
python ui.py
- Start Recording: Click the "Start Recording" button to begin capturing audio. The button will be disabled while recording is in progress.
- Stop Recording: Click the "Stop Recording" button to end the audio capture. The application saves the audio to a file and enables the transcription feature.
- Transcribe Audio: Click the "Transcribe" button to convert the recorded audio into text. The transcribed text is saved to a file named transcription.txt.
Contributions are always welcome! Whether you want to report an issue, suggest a feature, or submit a pull request, your input is greatly appreciated.
Welcome, contributors! Here's how you can help improve this project:
- Fork the Repository: Click "Fork" at the top right of the GitHub repository page.
- Clone Your Fork:
git clone https://github.com/your-username/Real-Time-Audio-Recorder-and-Transcriber-using-Whisper-AI.git
- Create a Feature Branch:
git checkout -b feature/your-feature-name
- Follow PEP 8 style guidelines for Python code.
- Add comments for complex logic.
- Write tests for new features (if applicable). -Test changes locally before submitting a pull request.
- Ensure your branch is updated with the latest
main
branch. - Submit a PR to the original repository's
main
branch. - Describe your changes clearly in the PR description.
- Address any review feedback promptly.
Modify ui.py
to use a different model (e.g., "medium"):
model = whisper.load_model("medium") # Instead of "small"
Change where files are saved by editing these lines in ui.py
:
# For audio files
wf = wave.open("custom_folder/output.wav", 'wb')
# For transcriptions
with open("custom_folder/transcription.txt", "w") as f:
Q: I get errors installing PyAudio on Windows. A: Try installing using pre-built binaries:
pip install pipwin
pipwin install pyaudio
Q: Whisper fails to download models.
A: Ensure you have an active internet connection. If blocked by a firewall, manually download the model from OpenAI's repository and place it in ~/.cache/whisper/
.
Q: "Permission denied" when saving files. A: Run the app as administrator/root, or change the save directory to a location with write permissions.
- User Feedback: The GUI shows error messages in alerts instead of console logs.
- Exception Handling: Contributors should wrap risky operations in try-except blocks:
try:
# Risky code
except Exception as e:
messagebox.showerror("Error", str(e))
- Logging: Consider adding logging for debugging (PRs welcome!).
- Real-time transcription while recording.
- Support for MP3 and other audio formats.
- Language selection for transcription.
- Progress bar during transcription.
- GUI dark/light theme toggle (WIP by @contributor).
A very big thanks to all the contributors for helping this project grow. Your efforts are greatly appreciated!