What audio formats can I upload as a reference?

AiMusic accepts mp3, wav, or ogg files up to 10 MB. You can also pass a public audio URL, and AresGen will fetch the file and forward it to the model.

What languages does MiniMax Music-01 handle for lyrics?

Lyrics are passed through to MiniMax Music-01 as-is. The model supports several major languages including English, Mandarin, Japanese, Korean, Spanish, French, and German. Results vary by genre and language combination; English is consistently strongest.

Who owns the rights to the reference I upload?

You do. AresGen does not store, redistribute, or grant licensing over your uploaded reference. You are responsible for having the rights to use any audio you upload: original recordings, properly licensed samples, or material you have explicit clearance for.

What does AresGen return per render?

A single MP3 file, decoded from the model response and saved to your music library. The output is a single mixed track. AresGen does not return separate vocal, drum, or instrumental layers; the model does not produce them.

Can I use the output commercially?

AresGen does not grant or restrict commercial rights beyond what aimlapi.com and MiniMax specify in their terms. Review the current aimlapi.com terms and any applicable MiniMax licensing before using output in paid campaigns or broadcast. For sensitive campaigns, confirm with the provider directly.

Why one engine, not a marketplace?

AresGen routes AI Music through MiniMax Music-01 because the surface is reference-conditioned, not prompt-to-song. The honest job-to-be-done is: bring a song, vocal, or instrumental you already like, give it new lyrics, get back a track that keeps the original sonic identity. Multi-engine orchestration here would mean tone drift between providers: the opposite of what reference-conditioned regeneration promises. One tuned engine, three honest modes, predictable output.

AI Music

Reference-conditioned music, three honest modes.

Bring a song, vocal, or instrumental you already love. Give it new lyrics. AresGen renders a new track that keeps the original sonic identity, powered by MiniMax Music-01.

Start free Read docs

One engine · Three modes · MP3 output

1 tuned engine: MiniMax Music-01 via aimlapi.com
3 honest modes: song / voice / instrumental
Reference + lyrics input: file upload or URL
MP3 download: single file per render

Routed engine

aimlapi.com → MiniMax Music-01

One tuned engine, three honest modes (song, voice, and instrumental), all reference-conditioned.

1 tuned engine: MiniMax Music-01 via aimlapi.com3 honest modes: song / voice / instrumentalReference + lyrics input: file upload or URLMP3 download: single file per render

By the numbers

One engine. Three modes. Two reference inputs. MP3 out.

1
Engine: MiniMax Music-01 (single platform); 3
Modes: song / voice / instrumental; 2
Reference inputs: file upload or URL; MP3
Output: single file, instant download

Backend surfaces

Three honest modes, one engine.

Song, voice, instrumental: each mode routes through the same reference-conditioned engine. No vendor drift between renders.

Song

Upload a reference song. Provide new lyrics. AresGen re-renders the track with the new lyrics while preserving the genre, tempo, and overall sonic identity of your reference.

mp3 / wav / ogg upload (max 10 MB) or audio URL
Lyrics required: custom text drives the new render
Output preserves reference tone and arrangement

Voice

Upload a reference vocal track. Provide new lyrics. AresGen generates a new vocal performance in the same vocal character, ready to drop into an existing instrumental.

Reference must be a clean vocal sample
Output preserves vocal timbre and character
Pair with the Instrumental mode for full-song workflows

Instrumental

Upload a reference instrumental. Provide lyrics. AresGen layers a generated vocal over the instrumental, matched to the tempo and key of the reference.

Reference must be vocal-free or vocal-light
New vocal layer aligned to the instrumental
No separate vocal stem in output: single mixed MP3

One platform

One tuned engine, not a multi-engine race.

aimlapi.com → MiniMax Music-01· Reference-conditioned music generation

Single provider, intentional. No multi-engine drift between renders.

Capabilities

Honest gaps. No marketing fog.

Character preservation applies to surfaces with a vocal (Song and Voice), not Instrumental. Every other capability is enumerated below from the same facts module the audit gate enforces.

Music capabilities across 3 surfaces. Filled dot indicates support.
Capability	Song	Voice	Instrumental
Reference-conditioned	●	●	●
Custom lyrics	●	●	●
Audio file upload (mp3/wav/ogg)	●	●	●
Reference via URL	●	●	●
MP3 download	●	●	●
Vocal character preserve	●	●	—

How a render runs

Three steps, one contract

Click each step to see exactly what AresGen sends to MiniMax Music-01. No auto-advance. The chip you tap is the step you see.

Step: Upload reference

Drop in an mp3, wav, or ogg up to 10 MB, or paste a public audio URL. The reference is uploaded to the aimlapi.com purpose-tagged endpoint as either voice or instrumental input.

Pricing

Metered, predictable, included.

Music renders are metered at 0.05 credits per generation, included in every paid AresGen plan. Free trial accounts start with enough credits for a handful of test renders. No separate music subscription.

See plans

Use cases

Four honest workflows our customers run today.

Re-lyric an existing brand jingle without losing the original character.

Upload your brand jingle in voice mode and supply the new campaign lyrics. The vocal character carries over while the message updates.

Iterate lyrics on a reference track before booking studio time.

Use song mode to test different lyric drafts against your demo. Same arrangement, different message, every render.

Add a custom vocal layer over a loop you already own.

Drop a loop or instrumental into instrumental mode and supply the lyrics. Output is a mixed MP3 with the new vocal sitting on the loop.

Reuse a signature vocal style across multiple lyrics.

Upload a vocal-only reference in voice mode once, then render different lyrics against it for episodic content.

FAQ

Six answers before you ask.

What audio formats can I upload as a reference?–
AiMusic accepts mp3, wav, or ogg files up to 10 MB. You can also pass a public audio URL, and AresGen will fetch the file and forward it to the model.
What languages does MiniMax Music-01 handle for lyrics?+
Lyrics are passed through to MiniMax Music-01 as-is. The model supports several major languages including English, Mandarin, Japanese, Korean, Spanish, French, and German. Results vary by genre and language combination; English is consistently strongest.
Who owns the rights to the reference I upload?+
You do. AresGen does not store, redistribute, or grant licensing over your uploaded reference. You are responsible for having the rights to use any audio you upload: original recordings, properly licensed samples, or material you have explicit clearance for.
What does AresGen return per render?+
A single MP3 file, decoded from the model response and saved to your music library. The output is a single mixed track. AresGen does not return separate vocal, drum, or instrumental layers; the model does not produce them.
Can I use the output commercially?+
AresGen does not grant or restrict commercial rights beyond what aimlapi.com and MiniMax specify in their terms. Review the current aimlapi.com terms and any applicable MiniMax licensing before using output in paid campaigns or broadcast. For sensitive campaigns, confirm with the provider directly.
Why one engine, not a marketplace?+
AresGen routes AI Music through MiniMax Music-01 because the surface is reference-conditioned, not prompt-to-song. The honest job-to-be-done is: bring a song, vocal, or instrumental you already like, give it new lyrics, get back a track that keeps the original sonic identity. Multi-engine orchestration here would mean tone drift between providers: the opposite of what reference-conditioned regeneration promises. One tuned engine, three honest modes, predictable output.

Pair this with

Pair narration with music

Use Voiceover for spoken narration and AresGen AI Music for the soundbed in the same workspace.

Score a generated video

Render a video with AI Video Studio, then bring it back through AI Music to add a custom score.