Skip to main content
Voice Text to Speech fal.ai

VibeVoice

VibeVoice costs $0.0040/req on FairStack — a text to speech model for Podcast-style conversations, Dialogue generation, Multi-character audiobooks. No subscription required. Pay per generation with full REST API access. FairStack applies a transparent 20% margin on infrastructure cost so you always see the real price.

FairStack price
$0.0040/req
Try VibeVoice API Quickstart Last updated 2026-05-13

What is VibeVoice?

VibeVoice is a multi-speaker dialogue generation model that produces conversational audio from script-format input with up to 4 distinct voices. Users write dialogue in script format using speaker labels, and the model generates a natural-sounding conversation with distinct voices, automatic speaker assignment, and natural conversational flow. With per-minute billing at approximately $0.04 per minute ($0.0007 per second), it is extremely affordable for dialogue content. The model includes 8 built-in voice presets: 5 English speakers (Alice, Carter, Frank, Maya, Mary) and 3 Chinese speakers (Bowen, Xinran, Anchen). The script format input is intuitive for content creators familiar with screenplay or dialogue writing. Compared to generating separate TTS tracks per speaker and mixing them together, VibeVoice handles the entire conversation in a single pass with natural turn-taking and conversational dynamics. Against other TTS models that handle only single speakers, the multi-speaker capability is a distinct advantage for dialogue content. Best suited for podcast-style conversations, dialogue generation, multi-character audiobooks, and interview-format content where natural multi-speaker dialogue is needed. Available on FairStack at infrastructure cost plus a 20% platform fee.

Key Features

Multi-speaker dialogue — up to 4 distinct voices
Script format input — "Speaker X: line" syntax
8 built-in voice presets (5 English, 3 Chinese)
Automatic speaker voice assignment
Affordable at $0.04/min (~$0.0007/sec)

What are VibeVoice's strengths?

Best multi-speaker dialogue model available
Natural conversation flow between speakers
Built-in presets for quick start
Very affordable per-minute pricing

What are VibeVoice's limitations?

Limited to 4 speakers maximum
Only English and Chinese presets
Script format required — not free-form text

What is VibeVoice best for?

Podcast-style conversations Dialogue generation Multi-character audiobooks Interview-format content

How much does VibeVoice cost?

Metric
FairStack
Details
Price per generation
$0.0040
Includes 20% margin
Per-second rate
$0.0007/sec
Billed per second of output
Subscription
None
Pay per generation only

How does VibeVoice perform across capabilities?

Estimated scores — VibeVoice. Unique multi-speaker capability. Limited preset library.

naturalness
82%
emotion range
70%
cloning accuracy
35%
multilingual
50%
latency
75%

How do I use the VibeVoice API?

curl
curl -X POST https://api.fairstack.ai/v1/generations/voice \
  -H "Authorization: Bearer $FAIRSTACK_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vibevoice",
    "prompt": "Your prompt here"
  }'
Python
import requests

response = requests.post(
    "https://api.fairstack.ai/v1/generations/voice",
    headers={
        "Authorization": f"Bearer {FAIRSTACK_API_KEY}",
        "Content-Type": "application/json",
    },
    json={
        "model": "vibevoice",
        "prompt": "Your prompt here",
    },
)

result = response.json()
print(result["url"])
Node.js
const response = await fetch(
  "https://api.fairstack.ai/v1/generations/voice",
  {
    method: "POST",
    headers: {
      Authorization: `Bearer ${process.env.FAIRSTACK_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "vibevoice",
      prompt: "Your prompt here",
    }),
  }
);

const result = await response.json();
console.log(result.url);

What parameters does VibeVoice support?

Parameter
Type
Default
Details
script
string
speakers
array
seed
integer (optional)
cfg_scale
float
1.3
Range: 1–2

Frequently Asked Questions

How much does VibeVoice cost?

VibeVoice costs $0.0040/req on FairStack as of 2026-05-13. This price includes FairStack's transparent 20% margin on infrastructure cost. No subscription or monthly fee — you pay per generation only. Minimum deposit is $1.

What is VibeVoice and what is it best for?

VibeVoice is a multi-speaker dialogue generation model that produces conversational audio from script-format input with up to 4 distinct voices. Users write dialogue in script format using speaker labels, and the model generates a natural-sounding conversation with distinct voices, automatic speaker assignment, and natural conversational flow. With per-minute billing at approximately $0.04 per minute ($0.0007 per second), it is extremely affordable for dialogue content. The model includes 8 built-in voice presets: 5 English speakers (Alice, Carter, Frank, Maya, Mary) and 3 Chinese speakers (Bowen, Xinran, Anchen). The script format input is intuitive for content creators familiar with screenplay or dialogue writing. Compared to generating separate TTS tracks per speaker and mixing them together, VibeVoice handles the entire conversation in a single pass with natural turn-taking and conversational dynamics. Against other TTS models that handle only single speakers, the multi-speaker capability is a distinct advantage for dialogue content. Best suited for podcast-style conversations, dialogue generation, multi-character audiobooks, and interview-format content where natural multi-speaker dialogue is needed. Available on FairStack at infrastructure cost plus a 20% platform fee. VibeVoice is best for Podcast-style conversations, Dialogue generation, Multi-character audiobooks. Available via FairStack's REST API with curl, Python, and Node.js SDKs.

Does VibeVoice have an API?

Yes. VibeVoice is available via FairStack's REST API at api.fairstack.ai. Send a POST request to /v1/generations/voice with your API key and prompt. Works with curl, Python requests, Node.js fetch, and any HTTP client. No SDK installation required.

How does VibeVoice compare to other voice models?

VibeVoice excels at Podcast-style conversations, Dialogue generation, Multi-character audiobooks. It is a text to speech model priced at $0.0040/req on FairStack. Key strengths: Best multi-speaker dialogue model available, Natural conversation flow between speakers. Compare all voice models at fairstack.ai/models.

What makes vibevoice stand out from other image generators?

vibevoice stands out with best multi-speaker dialogue model available and natural conversation flow between speakers. Generation typically completes in 5-15 seconds.

What are the known limitations of vibevoice?

Key limitations include: limited to 4 speakers maximum; only english and chinese presets; script format required — not free-form text. FairStack documents these transparently so you can choose the right model for your workflow.

How fast is vibevoice?

vibevoice typically completes in 5-15 seconds. This provides a good balance between output quality and processing speed for most production workflows.

What features does vibevoice support?

vibevoice offers: multi-speaker dialogue; script format input; 8 built-in voice presets (5 english, 3 chinese); automatic speaker voice assignment. All capabilities are accessible through both the FairStack web interface and REST API.

Start using VibeVoice today

$0.0040/req. Full API access. No subscription.

Start Creating