Oratio2 is a Python-based script that allows you to convert audio files into text using OpenAI's Whisper model. This script is designed to be simple, fast, and user-friendly, providing a straightforward way to transcribe your audio into written text. Below you will find detailed instructions on how to use it, along with descriptions of all the parameters.
Oratio2 is built to transcribe MP4, WAV, and other supported audio formats into text. It uses OpenAI's Whisper, a state-of-the-art audio recognition model. The transcription is easy to use, customizable by selecting different Whisper models, and provides progress indicators to keep track of the transcription.
Before using Oratio2, make sure you have the following installed:
- Python 3.6 or higher
- pip (Python package installer)
- Whisper: To transcribe the audio (
pip install openai-whisper) - Torch: Required for Whisper (
pip install torch) - tqdm: For progress bars (
pip install tqdm)
To install the necessary packages, run the following commands:
pip install openai-whisper
pip install torch
pip install tqdmThese commands will ensure all dependencies are installed and ready to use with Oratio2.
To use the Oratio2 script, you will need to provide an audio file and specify an output file for the transcribed text. Below is the command syntax:
python3 Oratio2.py -i <input_audio_file> -o <output_text_file> -m <model>python3 Oratio2.py -i first_meeting.mp4 -o transcription.txt -m baseThis command will transcribe the audio file first_meeting.mp4 into a text file named transcription.txt using the "base" Whisper model.
- Description: Specifies the path of the input audio file to transcribe.
- Format: The audio file can be in MP4, WAV, or any other format supported by Whisper.
- Required: Yes.
- Description: Specifies the path of the output text file where the transcription will be saved.
- Format: Provide a
.txtfile where you want the results. - Required: Yes.
- Description: Specifies which Whisper model to use.
- Options:
tiny,base,small,medium,large.- tiny: Fastest, lower accuracy.
- base: Good balance of speed and accuracy.
- small: More accurate but slower.
- medium: Better accuracy for larger files.
- large: Highest accuracy but slowest.
- Default:
base.
- Model Selection: Use
tinyorbasefor fast transcriptions when high accuracy isn't crucial. Usemediumorlargefor detailed and accurate transcriptions. - Audio Quality: For the best results, ensure your audio file is clear, with minimal background noise.
- Environment: If you have a GPU available, Whisper can run significantly faster. Make sure your
torchinstallation is configured to use CUDA if applicable. - Output File Location: Always provide a full path for the output file if running from different directories to avoid confusion.
The following is an example of how to use Oratio2 to transcribe an audio file:
python3 Oratio2.py -i /path/to/audio.mp4 -o /path/to/output.txt -m largeThis command will transcribe the given audio file with the large model for the best possible accuracy.
- File Not Found: If you encounter an error stating "Error: The input file does not exist", verify the file path you provided for the input audio.
- Slow Processing: If the transcription is slow, consider using a smaller model (
tinyorbase) or running the script on a machine with a GPU. - Warnings: If you see warnings about
FP16not being supported on CPU, it is because the script defaults to FP32 when running on CPU. This warning is safe to ignore.
For any further questions or issues, feel free to open an issue on the GitHub repository.
Enjoy using Oratio2 to easily convert your audio files to text!