Back to Projects

Voice

Started: February 12, 2025

Talk in, magic out.


What It Is

A collection of voice experiments. Text-to-speech, speech-to-text, voice cloning attempts, audio processing — anything involving spoken word and code.

Why Voice Fascinates Me

Voice is the most natural interface. Typing is a compromise we've accepted because technology demanded it. As voice tech improves, we'll type less and talk more.

Experiments Include

  • TTS comparisons — Testing different voices and services
  • Transcription accuracy — How good is Whisper really?
  • Voice cloning — Ethical experiments with consent
  • Audio effects — Real-time voice modification

Key Learnings

1. Whisper changed everything

Open-source, high-quality transcription. Before Whisper, accurate STT required expensive APIs. Now it's free and runs locally.

2. Voice cloning is too good

A few minutes of audio can clone a voice convincingly. The tech is ahead of the ethics discussions.

3. Latency is the killer

Real-time voice apps need sub-second response times. Anything slower breaks the illusion of conversation.


Agent Quick Start

# Voice

Voice experiments. TTS, STT, cloning, effects.

## Experiments
- Text-to-speech comparisons
- Transcription testing
- Voice cloning (ethical)
- Real-time audio processing

## Tools Used
- Whisper (transcription)
- ElevenLabs (TTS)
- Web Audio API (effects)

## Links
- Repo: https://github.com/sergiopesch/voice