AIVA — Classical/cinematic composition engine with MIDI workflows.
ProducerAI — “Music agent” layer built on top of frontier models.
Boomy
Soundraw
Mubert
Endlesss (loop‑based, collaborative)
Amper Music (legacy, now absorbed/licensed)
Jukebox‑based commercial derivatives (various)
Suno (v3, v4)
Udio
Open‑Source / Research‑Grade Music Models
These are self‑hostable or research‑oriented.
Stable Audio (1.x, 2.x)
Stable Audio Open — Open‑weights version of Stability’s text‑to‑audio system.
AudioCraft (Meta: MusicGen, AudioGen, EnCodec)
AudioLDM/AudioLDM2 — Latent diffusion for text‑to‑audio/music.
Magenta — Google’s long‑running music ML project (MelodyRNN, MusicVAE, etc.).
Jukebox — OpenAI’s pioneering hierarchical VQ‑VAE music generator.
MusicLM (unofficial) — Community implementation of Google’s MusicLM.
Riffusion OSS — Spectrogram‑diffusion music generator.
Mustango — Controllable text‑to‑music model.
DiffRhythm 2 — 2026 open‑source model; strong but still behind commercial systems.
MusicGen (Meta)
Riffusion (and Riffusion OSS)
Mustango
DiffRhythm / DiffRhythm 2
Magenta (MelodyRNN, MusicVAE, etc.)
OpenAI Jukebox
MusicLM (unofficial implementations)
MuseNet (legacy, not generally available now)
JEN‑1 / other academic text‑to‑music models
Background‑Music / Content‑Creator Platforms
These focus on safe licensing and mood‑based generation.
Mubert — API‑first generative music for apps and platforms.
Soundraw — Customizable music for video creators.
Boomy — Consumer‑friendly quick‑generation tool.
Sound‑Effects / Audio‑to‑Audio Models
Text-to-Audio: Models that generate audio from text, usually sound effects, ambience, environmental sounds, non‑musical audio.
Audio‑to‑audio: Models that transform existing audio (inpainting, style transfer, editing).
Most modern SFX models do both, so the categories overlap.
AudioGen - Meta’s text‑to‑audio model for SFX, ambience, and general audio.
AudioLDM2‑SFX — Diffusion‑based SFX generator; strong for environmental and synthetic sounds
Stable Audio Open —Open‑weights general audio generator (not music‑specific).
GANSynth — Legacy timbre‑focused GAN model; historically important.
Beatoven SFX — Commercial SFX generator with licensing‑safe output.
Voice / Speech Models
Speech editing and voice cloning; can modify specific words in an existing recording while preserving surrounding audio
These are not strictly “music models” but are essential for vocals, singing, and voice cloning.
Audio Codecs & Tokenizers
These are the building blocks for modern audio LLMs.
These tokens are what audio language models operate on, just as text tokens are the unit for LLMs
Opus - open, royalty-free, highly versatile audio codec
EnCodec — neural audio codecs that compress audio into discrete tokens at very low bitrates while preserving quality.
DAC - Descript Audio Codec (.dac), a high fidelity general neural audio codec, introduced in the paper titled High-Fidelity Audio Compression with Improved RVQGAN.