How It Works

01

Open Subtitle Picker

While a video is playing, tap the subtitle icon and select "AI Generated Subtitles" from the track list.

02

Pick Engine & Model

Choose Vosk for speed or Whisper for accuracy. Download your model once — it's stored locally.

03

Watch with Live Subtitles

Subtitles generate in real time, perfectly synced to what you hear. Tap Off any time to switch back.

Two Engines, One Goal

Different content needs different tradeoffs. Pick the engine that fits how you watch.

⚡ Fast · Default
Vosk
True streaming — subtitles arrive as you speak
Sync delay ~500 ms
Models 2 (en-US + en-IN)
Model size ~40 MB each
Runs offline ✓ Yes
Best for Hollywood, Bollywood, English shows
🎯 Accurate
Whisper
Transformer-based — higher quality on difficult audio
Sync delay 2 – 6 seconds
Models 4 (tiny → small, multilingual)
Model size 31 – 181 MB
Runs offline ✓ Yes
Best for Noisy audio, multiple speakers, multilingual

Available Models

Download models directly from the app. They're stored locally and reused across sessions.

vosk-model-small-en-us Vosk
Disk ~40 MB
Sync delay ~500 ms
Best for: American English — Hollywood, US creators
vosk-model-small-en-in Vosk
Disk ~40 MB
Sync delay ~500 ms
Best for: Indian English — Bollywood, Indian creators
ggml-tiny.en-q5_1 Whisper
Disk 31 MB
Sync delay ~2 s
RAM ~150 MB
Language English
Best for: English — fastest Whisper variant
ggml-base.en-q5_1 Whisper
Disk 57 MB
Sync delay ~5 s
RAM ~210 MB
Language English
Best for: English — balanced quality & speed
ggml-base-q5_1 Whisper
Disk 57 MB
Sync delay ~5 s
RAM ~210 MB
Language Multilingual
Best for: Non-English content, auto language detect
ggml-small.en-q5_1 Whisper
Disk 181 MB
Sync delay ~6 s
RAM ~470 MB
Language English
Best for: English — highest quality on phone

See It In Action

Torrvilla AI Subtitle Models screen showing Vosk and Whisper models available for download
Torrvilla AI Subtitle setup dialog — choose between Vosk Fast and Whisper Accurate engines
Torrvilla live AI-generated subtitles displayed over a playing video

Built for Privacy & Reliability

On-Device Privacy

100% local processing. No audio is ever sent to a server. Your viewing stays private.

Works Offline

Download a model once on Wi-Fi. After that, subtitles work with no internet connection at all.

Perfect Sync

AV-shifting delay buffers audio so generated cues appear exactly when words are spoken.

Two Engines

Vosk for near-instant captions, Whisper for higher accuracy. Switch per session without leaving the player.

Clean Integration

AI subs appear in the standard subtitle picker. Selecting any other track instantly disables AI mode.

Beta & Improving

Actively developed. More models, languages, and accuracy improvements are on the roadmap for v1.0.

v0.9 — Latest Release

Get Torrvilla with AI Subtitles

Free for Android. Download the latest version and enable AI subtitles from the subtitle picker inside any video.

Download Now — Free
View All Features

Frequently Asked Questions

Does it need internet to work?

No. Models are downloaded once (40–181 MB depending on your choice) and then run entirely on your device. No audio ever leaves your phone.

Which engine should I choose?

Use Vosk for most content — it syncs within ~500ms and handles English movies and series well. Use Whisper for noisy audio, multiple speakers, or when you need higher transcription accuracy.

How accurate are the AI subtitles?

This is a beta feature. Accuracy is good for clear speech, but you may see occasional errors with heavy accents, background music, or overlapping speakers. We're actively improving it.

What languages are supported?

Currently English (American accent via Vosk en-US), Indian English (via Vosk en-IN), and multilingual auto-detection via the Whisper base multilingual model. More languages are planned.