Categories: Electronics

Voice-Enabled ChatGPT Terminal with ESP32 and Google TTS


This challenge takes the ChatGPT terminal launched within the November 2023 problem to the following stage the place it now speaks like a VoiceGPT.

On this new design, the operate of talking out the questions and solutions has been added by incorporating an I2S sound module just like the MAX98357A and a 4-ohm loudspeaker. The output energy is spectacular and distortion-free; it should be heard to be believed.

Fig. 1: Voice-Enabled ChatGPT Terminal with ESP32 and Google TTS

Two variations are proven right here. Fig. 1 shows the writer’s prototype, and the parts required for the terminal are listed within the Invoice of Supplies (Desk 1). Desk 2 exhibits the pin connections between the ESP32 board (MOD1) and the 8.89cm (3.5-inch) TFT (MOD2). Desk 3 exhibits the pin connec­tions between the PS2 keyboard and the ESP32 board.

Desk 1: Invoice of Supplies
Objects Amount
ESP32 – MCU (MOD1) 1
PS2 keyboard 1
8.89cm (3.5-inch) TFT (MOD2) 1
Wires, PCB, connector 20
IC 7805: 5V regulator IC (IC1) 1
I2S sound module MAX98357A (MOD3) 1
100MFD capacitor (C1, C2) 2
Rectifier diode 1N4007 (D1) 1
Desk 2: Pin connections between ESP32 board and eight.89cm (3.5-inch) TFT
8.89cm (3.5-inch) TFT pin ESP32 Board Pin 8.89cm (3.5-inch) TFT Pin ESP32 Pin
LCD_CS G33 D02 G26
LCD_DC G15 D03 G25
LED RST G32 D04 G16/RX2
WR G4 D05 G17/TX2
RD G2 D06 G27
D0 G12 D07 G14
D01 G13 VCC/5 Volt 5V regulator
Gnd Gnd ESP32 Gnd to be linked to 5V regulator floor and ESP32 Vin to be linked to 5V regulator output
Desk 3: Pin connections between PS2 keyboard and ESP32 board
PS2 Keyboard Pin ESP32 Pin
DATA pin G35
CLK pin G34
5V 5V pin of voltage regulator 7805 (IC1)
GND GND
Desk 4: Pin connections between MAX98357A (MOD3) and ESP32 board (MOD1)
MAX98357A ESP32 Pin
DIN G21
BLK G22
LRC G23
GND GND
5V 5V pin of voltage regulator 7805 (IC1)

Voice-Enabled ChatGPT Terminal – Circuit and Working

Fig. 2 exhibits the circuit diagram of the ChatGPT terminal that talks to the ESP32 board. It’s constructed across the ESP32 board (MOD1), 8.89cm (3.5-inch) TFT (MOD2), MAX98357A (MOD3), 5V voltage regulator 7805 (IC1), PS2 keyboard, and some different parts.

Fig. 2: Circuit diagram

Join the parts and show them in response to the circuit diagram. The voltage regulator used within the circuit powers the machine within the voltage vary of 5V to 9V. Nevertheless, you’ll be able to change the voltage regulator with a 5V-regulated energy adaptor.

– Commercial –

Observe that the ‘achieve’ of MAX98357A is linked to the bottom to extend output energy. Eradicating this connection will barely scale back the output energy.

A PS2 keyboard like a USB keyboard wants solely 4 pins to connect with a microcontroller’s 5V, floor, knowledge, clock (or IRQ) pins. Since this can be a pure data-in machine, some GPI pins of ESP32 akin to G34 and G35 will be linked to it.

GPIO pins G34 and G35 are data-in pins solely; they aren’t data-out pins.

– Commercial –

By no means use them in TFT, LEDs, relays, and so forth, as they want sign out, in contrast to keyboard pins that solely want knowledge in from the keyboard. In fact, you’ll be able to join a PS2 keyboard to different GPIO pins as nicely.

Fig. 3 exhibits the pin particulars of the PS2 keyboard and mouse cable pin particulars. Fig. 4 exhibits the I2S voice module.

Fig. 3: PS2 keyboard and mouse cable pins
Fig. 4: I2S voice module

Making ChatGPT Speak

Making ChatGPT discuss is made easy by the breakthrough with Google TTS (text-to-speech) and utilizing ESP32 board.

Right here, GTTS is used, as a result of in earlier designs, the few hardware-controlled sound libraries obtainable without cost like ESP8266SAM.h, AudioOutputI2S.h, and AudioGeneratorRTTTl.h, though they work for ESP32, the sound high quality of ESP8266SAM.h is disappointing.

For prolonged speech, it’s difficult to discern, because it seems like it’s talking via an extended pipe. Whereas AudioGeneratorRTTTL.H produces stunning ringtones, it can’t converse.

Due to this fact, Google TTS is the one selection for this method.

Fig. 5: OPEN-AI API creation

Solely three GPIO pins and an web connection are required to make the I2S amplifier work with Google TTS; 4 header recordsdata (obtainable on GitHub) are additionally essential to activate the amplifier. These header recordsdata have been compiled into a zipper file for ease of use with this terminal.

Google TTS can converse a million characters per 30 days free to your account. Talking ‘EFY’ is a 3-character expense however talking ‘Electronics For You’ is a 17-character expense! Additionally, Google will converse solely 200 characters at a time. Past 200 characters, a paid account from Google is required.

Merely ask a query, search a solution, or request a code snippet, and the ESP32 will help in writing the issue assertion on the TFT display screen utilizing a PS/2 keyboard linked to the ESP32.

The ESP32 will then ship the query to ChatGPT, and the output obtained from ChatGPT can be displayed on the identical TFT. Concurrently, the loudspeaker linked to the I2S amplifier (MAX98357A) will converse out the textual content seen on the display screen.

Fig. 6: Voice-Enabled ChatGPT Terminal code

No multimedia laptop is required for this work, although an web Wi-Fi connection and a secret key to entry the ChatGPT API are crucial. This secret secret’s the one level of entry for the openai.com API; no different login/password is required.

You may surprise why an previous PS2 keyboard is used as a substitute of a USB keyboard. Though an try was made to make use of a smooth, small USB keyboard, it couldn’t talk with the ESP32. The identical applies to the sleekest Bluetooth keyboard.

The software program code is developed within the Arduino IDE. Within the code, it’s essential to configure the OPEN AI API and the Wi-Fi SSID and password. After configuring the code, the supply code will be uploaded to the ESP32 by deciding on the proper COM port and board as ESP32. Fig. 6 shows a snippet of the supply code.

Building and Testing

You could assemble the circuit on a general-purpose PCB, as proven within the writer’s prototype. First, add the supply code of the terminal into the ESP32 board. Then, seek advice from Tables 2 via 4 earlier than assembling. After powering on, watch for a while.

The delays used inside loops are fairly particular. You could alter them, however beginning with the offered values is recommended. As soon as an understanding of the responses is gained, changes will be made, as wanted.

The preliminary questions, akin to ‘Who’re you?’, had been answered meticulously by the ChatGPT, producing a self-introduction on the display screen, and the speaker delivering it properly.

Subsequently, extra superior questions had been requested, akin to ‘speak about EFY’ and ‘the gap between Earth and the Solar.’ Every time, the ChatGPT understood the narrative clearly and offered meticulous solutions. The speaker labored flawlessly to ship the voice output clearly and loudly.

Fig. 7: Voice-Enabled ChatGPT Terminal utilizing ESP32 and Google TTS

Probably the most superior stage of questioning concerned duties like writing 5 sentences about EFY, India, NTPC, Google.com, and so forth. All through all of the assessments, the ChatGPT carried out exceptionally nicely.

Nevertheless, as these solutions exceeded 200 characters, Google TTS refused to talk them. To handle this, the reply string was modified to trim it to 200 characters. This manner, whereas the complete reply seems on the display screen, internally it speaks as much as 200 characters solely. The writer’s closing prototype, together with the keyboard utilized in testing, is proven in Fig. 7.

Aftermath: The subsequent modification for this terminal will make it absolutely voice interactive, so it is going to hearken to questions and reply like an obedient robotic! This growth is for the EFY readers.


Somnath Bera is an electronics fanatic. He has contributed many articles throughout the globe as a freelancer.

Uncomm

Share
Published by
Uncomm

Recent Posts

That is the POCO X7 Professional Iron Man Version

POCO continues to make one of the best funds telephones, and the producer is doing…

5 months ago

New 50 Sequence Graphics Playing cards

- Commercial - Designed for players and creators alike, the ROG Astral sequence combines excellent…

5 months ago

Good Garments Definition, Working, Expertise & Functions

Good garments, also referred to as e-textiles or wearable expertise, are clothes embedded with sensors,…

5 months ago

SparkFun Spooktacular – Information – SparkFun Electronics

Completely satisfied Halloween! Have fun with us be studying about a number of spooky science…

5 months ago

PWMpot approximates a Dpot

Digital potentiometers (“Dpots”) are a various and helpful class of digital/analog elements with as much…

5 months ago

Keysight Expands Novus Portfolio with Compact Automotive Software program Outlined Automobile Check Answer

Keysight Applied sciences pronounces the enlargement of its Novus portfolio with the Novus mini automotive,…

5 months ago