Ultimate Vocal Remover (UVR) is a free, open-source desktop application that uses deep neural networks to separate vocals, instrumentals, drums, and other stems from audio files. Available for Windows, Mac, and Linux with GPU acceleration and multiple AI model architectures including MDX-Net, VR Architecture, and Demucs v4.
Ultimate Vocal Remover (UVR) is a free, open-source desktop application developed by Anjok07 that uses deep neural networks to separate vocals, instrumentals, drums, bass, and other stems from any audio file. Unlike browser-based tools with file size caps and subscription paywalls, UVR runs entirely on your local machine — processing audio using your GPU without sending data to any server.
UVR is best suited for music producers, DJs, audio engineers, and content creators who need high-quality stem separation without recurring costs. Its MDX23C-InstVoc HQ model consistently outperforms most paid alternatives for vocal isolation on complex rock and pop recordings. It supports multiple AI architectures (VR Architecture, MDX-Net, Demucs v3/v4) and allows GPU-accelerated batch processing.
If you need a lightweight, no-install, browser-based solution or require cloud processing for mobile workflows, UVR is not the right fit — it demands a moderately powerful GPU (minimum Nvidia GTX 1060 6GB) and a desktop environment on Windows, Mac, or Linux.
| Feature | Ultimate Vocal Remover (UVR) |
|---|---|
| Primary use case | Vocal separation, stem splitting, audio source separation |
| Best for | Music producers, DJs, audio engineers, remix creators |
| Access type | Desktop application (Windows, Mac, Linux) |
| AI model | Multiple: VR Architecture (custom-trained), MDX-Net, Demucs v3/v4; MDX23C-InstVoc HQ is the flagship model |
| Input formats | WAV, FLAC, MP3 (requires FFmpeg for non-WAV) |
| Output formats | WAV, FLAC, MP3 |
| Max file / duration | No hard cap; processing time depends on hardware and segment size settings |
| Processing type | Local batch processing; GPU-accelerated (not real-time) |
| Output quality | Up to source quality; output sample rate mirrors input |
| Language support | N/A (audio processing tool, no speech recognition) |
| API availability | No public API; CLI usage possible via Python scripts |
| DAW / plugin support | Standalone only; no VST/AU plugin |
| Collaboration | No |
| Pricing model | Free and open-source (MIT license) |
| Free plan | Yes — fully free, no feature restrictions |
| Paid plans | None; voluntary donations accepted via Buy Me a Coffee |
UVR bundles three distinct AI architectures — VR Architecture (custom-trained by core devs), MDX-Net (based on Kuielab's research), and Facebook's Demucs v3/v4 — all accessible from a single GUI. You can switch models without reinstalling anything and run ensemble processing using outputs from multiple models for improved separation quality on difficult tracks. This level of model flexibility is rare even in paid tools.
UVR processes all audio locally on your machine using Nvidia CUDA (or Apple M1 MPS for Mac users). There are no file size limits imposed by a server, no subscription throttling, and no privacy concerns about uploading proprietary audio tracks. For studios working with unreleased material, this is a significant operational advantage over cloud-based alternatives.
Demucs v4 integration enables 4-stem separation (vocals, drums, bass, other) in a single pass, while custom MDX-Net models can isolate specific instruments across a range of genres. Additional tools include time-stretching and pitch-shifting via the Rubber Band library, making UVR useful beyond simple vocal removal — it functions as a full audio decomposition workstation.
Solo creator / audio engineer: $0 indefinitely. Processing cost is your local electricity and hardware amortization only. For a 5-minute track on a mid-range GPU, processing typically takes 30–120 seconds depending on the model and segment size settings.
Small studio: $0 in recurring costs. Multiple team members can run their own instances; there are no per-seat licensing fees. Hardware investment (GPU upgrade) is the primary cost consideration.
High-volume developer: UVR has no API, so automated pipelines require wrapping the Python CLI directly. The cost remains $0 in software fees, but engineering time for integration is a real consideration. Cloud-based APIs like AssemblyAI or Demucs-as-a-service alternatives may be more operationally efficient at scale.
Pricing information is based on the official UVR GitHub repository and website as of May 2026. No paid tiers exist.
.exe for Windows (must install to C:\), .dmg for Mac (M1 arm64 or Intel x86_64), or follow the manual Python setup for Linux.Pro tip: If you hear artifacts in the separated vocals, try adjusting the "Segment Size" slider — lower values reduce memory use but may change artifact patterns; running the same track through two different models and merging outputs in your DAW often yields cleaner results than any single model alone.
UVR has earned a strong reputation in the audio production community, with users on Reddit consistently rating it above paid alternatives. The MDX23C-InstVoc HQ model in particular draws praise for its vocal isolation quality on rock and pop recordings.
What users praise: Zero cost, local GPU processing with no upload requirement, excellent vocal isolation quality with MDX23C-InstVoc HQ, support for multiple AI architectures, and batch processing capabilities.
Common frustrations: Installation complexity on Mac (especially Sonoma), GPU memory errors on lower-end cards requiring segment size tuning, no browser or mobile access, and occasional artifacts on complex mixes with dense layering.
Have you tried Ultimate Vocal Remover? Share your experience in the review section below to help other audio creators make the right choice.
Yes — UVR is completely free and open-source under the MIT license. All models, output formats, and processing features are available at no cost. The developers accept optional donations via Buy Me a Coffee, but no features are paywalled.
UVR natively supports WAV, FLAC, and MP3 as both input and output formats. Processing non-WAV files requires FFmpeg to be installed separately. Output format (WAV, FLAC, or MP3) is selectable in the interface.
No — UVR processes audio in batch mode only. It is not a real-time plugin. For a 5-minute track, processing typically takes 30–120 seconds on a mid-range Nvidia GPU, depending on the model and segment size configuration.
No public API is available. The application runs as a standalone GUI. Developers can invoke the underlying Python scripts directly for CLI-style automation, but there is no documented REST or SDK API for integration into external pipelines.
On mainstream pop and rock recordings, UVR's MDX23C-InstVoc HQ model produces separation quality that rivals or exceeds many paid tools. Accuracy degrades on heavily layered electronic music and complex orchestral arrangements — this limitation applies to all AI stem separators, not just UVR. For these genres, running multiple models and comparing outputs is recommended.
Yes, with caveats. UVR is actively used in professional workflows for remix production, sample clearance, and audio forensics. The lack of DAW integration (no VST/AU) means stems must be file-exported and re-imported, adding friction compared to plugin-based tools like iZotope RX. Studios with unreleased material benefit from UVR's fully local processing model.
Transform content creation with user-friendly editing, AI-powered tools, and effortless collaboration.
Enhance audio quality with machine learning-based repair and instant cleanup.
Achieve clear online communication with AI-powered noise reduction and live meeting transcription.