How to record text-to-speech on MacOS X 10.5

I found an interesting article about the history of "intellectual property". My friend didn't have time to read it, but was interested in hearing it. MacOS X has a reasonable text-to-speech engine, so I wanted to create a sound file for him. The process is not difficult, but it's not obvious. It will cost you only a bit of time.

[Edit: Later that year, I found out that you can use the command line program 'say' to do the same with much less effort. Type 'man say' to find out how. The instructions below are still useful for capturing audio from the output mix, though. At least the bits with the *s.]

1. Install Soundflower

Sunflower opens a conduit between the audio output device and the audio input device. This allows you to pipe the audio output of one application to the input of another. You could also do this using a short cable connecting the headphone jack to the microphone jack, or by simply playing the audio through speakers and recording through a microphone in a quiet location. The software solution delivers the signal without decoding and reencoding, and therefore will give the highest fidelity.

Download Soundflower. It's free, open source, and licensed under the GPL. It comes as a .dmg, so to install you mount the volume, and run the .mpkg installer.

2. Install Audacity

Audacity lets you record audio and encode it in various formats. You can probably use GarageBand, but I found this easier (less feature-encumbered). Audacity is also free, open source, and GPL'd. Download the .dmg, mount it, and move Audacity.app into your Applications folder or a subfolder.

3. Configure System Preferences *

System | Speech | Text to Speech

To me, the Alex voice sounds best. Keep in mind that the Speaking Rate will effect a tradeoff between recording length, and intelligibility. I chose a bit faster than normal, because I don't want to feel as though I'm waiting for Alex to finish its sentances.

Select "Speak selected text when the key is pressed" and set a hot key.

Hardware | Sound | Output

Select "Soundflower (2ch)" as the sound output device.

4. Select the text to be spoken

Open the reader or web site, and select your chosen text. You may need to futz with the text to get TTS to interpret it better. For example,
sed s/copyleft/copy-left/gi
(replace "copyleft" with "copy-left") to avoid having "copyleft" spelled out.

5. Run Audacity *

Set Preferences:
Set Audio I/O | Recording | Device to Soundflower (2ch) and Channels to 1 (Mono).
For voice, you don't need to waste a lot of disk space and bandwidth, so:
Set Quality | Sampling | Default Sample Rate to 8000 Hz and Default Sample Format to 16-bit.

Back on the main Audacity interface, click record, switch to your text reader, and press your TTS hot key.

5. Wait

You won't be able to hear the audio, but you'll see the signal in Audacity. For reference, the article mentioned above took 50 minutes. If you find a formula relating word count to length, let me know.

6. Trim the file

Once the signal drops off, alerting you to the end of the reading, stop the recording, and delete the extra seconds at the beginning and minutes at the end of the track. Then export the sound file using the patent-unencumbered encoding of your choice.

Here is the result in Ogg Vorbis.

If you choose to edit the sound file further, you may notice that Alex takes a breath before beginning a sentance, and a shorter one when it encounters a comma.

7. Play

Optionally, you can use GarageBand to create a podcast complete with artwork and chapter markers.

8. Support FLOSS

Support your favourite open source software project, because it lets you do such great things with that expensive computer.

9. Improve the guide

If you find an error, need clarification, or use this method to record something neat, let me know. I'm sure you can figure out my email.

Created: 2009-06-13
Last time I changed this date: 2011-04-25