Talk in, magic out.
What It Is
A collection of voice experiments. Text-to-speech, speech-to-text, voice cloning attempts, audio processing — anything involving spoken word and code.
Why Voice Fascinates Me
Voice is the most natural interface. Typing is a compromise we've accepted because technology demanded it. As voice tech improves, we'll type less and talk more.
Experiments Include
- TTS comparisons — Testing different voices and services
- Transcription accuracy — How good is Whisper really?
- Voice cloning — Ethical experiments with consent
- Audio effects — Real-time voice modification
Key Learnings
1. Whisper changed everything
Open-source, high-quality transcription. Before Whisper, accurate STT required expensive APIs. Now it's free and runs locally.
2. Voice cloning is too good
A few minutes of audio can clone a voice convincingly. The tech is ahead of the ethics discussions.
3. Latency is the killer
Real-time voice apps need sub-second response times. Anything slower breaks the illusion of conversation.
Agent Quick Start
# Voice
Voice experiments. TTS, STT, cloning, effects.
## Experiments
- Text-to-speech comparisons
- Transcription testing
- Voice cloning (ethical)
- Real-time audio processing
## Tools Used
- Whisper (transcription)
- ElevenLabs (TTS)
- Web Audio API (effects)
## Links
- Repo: https://github.com/sergiopesch/voice