Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.sf-voice.sh/llms.txt

Use this file to discover all available pages before exploring further.

Debugging voice AI at scale is a listening problem. When your system handles thousands of calls a day, you can’t replay recordings one by one to find where the agent interrupted a caller, escalated a frustrated customer, or misread intent. Mise solves this by indexing not just what was said, but how it was said — making every conversation in your corpus searchable by acoustic signal.

The problem with transcript-only observability

Most observability tools treat a phone call as a sequence of words. They index transcripts, measure call duration, and flag keywords. That approach misses the majority of the signal. A caller who says “fine, whatever” may be satisfied or deeply frustrated. An agent that speaks faster after a long pause may be interrupting. Sarcasm, resignation, and confusion don’t appear in transcripts — they live in the prosody, tone, and rhythm of the conversation.

What others index

Words and timestamps. Keyword matches. Call duration. Transcript sentiment estimated from text alone.

What Mise indexes

Tone, prosody, tension, rhythm, and intent — captured at every turn from the audio itself, not inferred from words.

Acoustic indexing, not just transcription

Mise processes every turn of every call and extracts five acoustic dimensions:
  • Tone — Sentiment, irony, and sarcasm as expressed in the voice, not the text
  • Prosody — Pace, pauses, and emphasis that carry meaning beyond words
  • Tension — Frustration and escalation signals as they develop across a call
  • Rhythm — Cadence, interruptions, and overlap between speakers
  • Intent — What the caller is actually trying to accomplish, inferred from acoustic context
These features are indexed per turn and aggregated across your entire call corpus. When you query Mise, you’re searching acoustic space — not just a text database.
Mise indexes audio archived per turn and as full call recordings. Transcripts are one input to the index, not the index itself.

Searching your call corpus

Instead of filtering dashboards or writing SQL against call metadata, you express what you’re looking for in natural language:
voice.query("calls where the agent interrupted the caller")
# → 1,284 matches → clustered into 6 defect signatures
Mise returns ranked matches and automatically clusters them into defect signatures — recurring patterns that represent distinct failure modes. This is the difference between finding individual bad calls and understanding a systemic problem.

Who Mise is built for

Mise is designed for engineering and product teams running voice AI at scale — typically 10,000 or more calls per day. It integrates directly with the stacks these teams already use:

LiveKit

Real-time voice pipelines

Twilio

Programmable telephony

Telnyx

Cloud communications

Pipecat

Voice AI orchestration

Datadog

Metrics and alerting

How it compares

ApproachWhat you getWhat you miss
Keyword searchCalls containing specific wordsEverything expressed acoustically
Transcript analyticsText-level sentiment, topic detectionTone, prosody, frustration, sarcasm
Time-series dashboardsAggregated metrics over timeTurn-level signal, defect clustering
MiseAcoustic features indexed at every turn, searchable across your corpus

Getting access

Mise is in private alpha. Teams are admitted based on stack and call volume.

Request access

Request access at sf-voice.sh/sign-up. Your team will be reviewed and onboarded with support from the Mise team.