Raspberry Pi Pico W Text to Speech Using Wit.ai

by CircuitDigest in Circuits > Raspberry Pi

39 Views, 0 Favorites, 0 Comments

Raspberry Pi Pico W Text to Speech Using Wit.ai

Raspberry-Pi-Pico-Text-to-Speech-Using-Wit.ai.jpg

This project turns a Raspberry Pi Pico W into a talking device using an online text to speech service. The Pico sends text over WiFi, gets back audio, and plays it through a small speaker using an I2S amplifier. The hardware stays simple and the heavy speech processing happens in the cloud.

I built this as a clean, repeatable setup that actually works on a microcontroller with limited memory. If you follow the steps below, you should end up with a Pico that speaks anything you type into the serial monitor.

No theory dumps here. This is mostly wiring, setup, and getting audio out of the board.

What You’re Building

At a high level, this is what’s happening:

The Pico W connects to WiFi
You type text into the serial monitor
That text is sent to Wit.ai
Wit.ai converts it to speech
Audio is streamed back as MP3
The Pico sends audio over I2S
A MAX98357A amplifier drives a speaker

The Pico never tries to generate speech itself. It just moves data around and plays audio.

Supplies

Here’s exactly what I used:

Raspberry Pi Pico W
MAX98357A I2S audio amplifier
Small speaker (4 ohm or 8 ohm both work)
Breadboard
Jumper wires
USB cable

Wiring the Hardware

This is the part that matters most. If the wiring is wrong, you’ll get silence no matter how perfect the code is.

The MAX98357A is an I2S amplifier, which means it uses three signal lines plus power and ground.

Pico W to MAX98357A connections:

GP18 → BCLK
GP19 → LRC
GP20 → DIN
5V → VIN
GND → GND

That’s it. No resistors, no level shifting, no extra parts.

After wiring:

Double check GND connections
Make sure VIN is actually 5V, not 3.3V
Make sure the speaker is connected to the amplifier, not directly to the Pico

If you hear popping or distortion later, it’s almost always power or speaker wiring.

Creating a Wit.ai Account

You need an online TTS service, and this project uses Wit.ai.

Step 1: Create an account

Go to https://wit.ai and sign up. Email signup is the easiest. Verify your email and log in.

Step 2: Create a new app

Once inside the dashboard:

Click to create a new app
Give it a name
Pick the language you want speech in

The language choice matters. Pick English unless you know you want something else.

Step 3: Get the Server Access Token

Go to:

Management
Settings
HTTP API

Copy the Server Access Token. This is what the Pico uses to authenticate.

Step 4: Save the token

Keep this token safe. If you regenerate it later, you must reupload the sketch with the new token.

Installing the Arduino Environment

You’ll program the Pico W using Arduino IDE.

Make sure you already have:

Arduino IDE installed
Raspberry Pi Pico board support added

Once that’s done, install the library.

Installing the WitAITTS library

Open Arduino IDE
Open Library Manager
Search for WitAITTS
Click Install

When it finishes, you’re ready to use the example sketch.

Opening the Example Sketch

Instead of writing code from scratch, use the provided example.

Go to:

File
Examples
WitAITTS
PicoW_Basic

This opens a working sketch that already handles WiFi, HTTPS, audio streaming, and I2S playback.

Editing the Code

You only need to change a few things.

WiFi credentials

Replace these with your actual network details:

#define WIFI_SSID "YourWiFiSSID"

#define WIFI_PASSWORD "YourWiFiPassword"

Wit.ai token

Paste your Server Access Token here:

#define WIT_TOKEN "YOUR_WIT_AI_TOKEN_HERE"

That’s all you must change.

What the Important Code Lines Do

I’m not going to walk through every line. These are the ones that matter.

Create the TTS engine

WitAITTS tts;

This object handles everything. WiFi, HTTPS, decoding audio, and pushing sound to the amplifier.

Start WiFi and authenticate

tts.begin(WIFI_SSID, WIFI_PASSWORD, WIT_TOKEN);

If this fails, nothing else works. Watch the serial output carefully here.

Set the voice

tts.setVoice("wit$Remi");

This changes how the voice sounds. You can experiment with other voice IDs later.

Control speed and pitch

tts.setSpeed(100);

tts.setPitch(100);

Start with these values. Extreme settings make speech hard to understand.

Speak the text

tts.speak(text);

This sends text to Wit.ai, waits for the audio stream, and plays it immediately.

While audio is playing, the Pico is busy. That’s normal.

Uploading the Sketch

Before uploading:

Click Verify
Make sure there are no compile errors

Now:

Plug in the Pico W
Select the correct board and port
Click Upload

When the upload finishes, open the Serial Monitor.

Testing the Speech Output

Set the Serial Monitor to:

Correct COM port
Newline enabled
Default baud rate from the sketch

You should see:

WiFi connection messages
IP address
WitAITTS configuration info

Now type a sentence and press Enter.

If everything is working:

The serial monitor will say it’s requesting TTS
Audio starts playing almost immediately
The speaker speaks your text

The first request sometimes takes a second longer. After that, it feels fast.

How Audio Streaming Works

Audio comes in as an MP3 stream. It’s not downloaded all at once.

That gives you a few advantages:

Lower memory usage
Faster perceived response
No big buffers needed

Things that affect audio quality:

WiFi stability
Power supply
Speaker quality

If audio cuts out, start by checking power and signal wires.

Common Problems and Fixes

No sound at all

Check these first:

Speaker wired to amplifier, not Pico
Amplifier VIN is 5V
GND is common everywhere
GP18, GP19, GP20 are correct

HTTP 401 error

This means:

Token is wrong
Token was regenerated
Token has extra spaces

Fix it by pasting the token again and reuploading.

Distorted or crackling audio

Usually caused by:

Weak USB power
Bad speaker
Loose jumper wires

Short wires help. Breadboards don’t love high speed audio signals.

Nothing happens after typing text

Check that:

Serial Monitor is set to newline
WiFi actually connected
Text isn’t empty

Final Thoughts

Once it works, you can experiment:

Change voices
Adjust pitch and speed
Cache common phrases
Add buttons instead of serial input
Trigger speech from sensors

You can also store a few important audio files locally as a fallback when WiFi is down.

This setup gives the Pico W a real voice without pushing it beyond what it can handle. All the heavy speech processing stays in the cloud, and the Pico just focuses on control and playback.

If you wire it cleanly and keep your token correct, it’s very reliable. Once you hear it speak for the first time, it’s honestly hard not to start adding it to other projects.

The above is entirely based on: Raspberry Pi Pico Text to Speech using AI