Voice Cloning

Clone a voice from an audio sample and use it for text-to-speech generation.

Several FairStack voice models support voice cloning -- generate speech that sounds like a specific person from a short audio reference.

Upload a reference

Upload a 10-30 second audio sample of the voice you want to clone:

curl -X POST https://api.fairstack.ai/v1/voice/reference \
  -H "Authorization: Bearer $FAIRSTACK_API_KEY" \
  -F "file=@voice-sample.mp3"

Generate with the cloned voice

{'{'}
  "model": "indextts2",
  "prompt": "This is my cloned voice speaking.",
  "ref_audio_url": "https://media.fairstack.ai/voice-ref/abc123.mp3"
}

Quality tips

  • Use clean audio without background noise
  • 10-30 seconds is optimal -- more is not always better
  • Speak naturally at a consistent volume
  • IndexTTS2 has the best raw cloning fidelity
  • Qwen3-TTS supports both design (description-based) and clone modes

Next steps