doc/index.md - platform/external/sonic - Git at Google

 # libsonic Home Page

 [Download the latest tar-ball from here](download).

 The source code repository can be cloned using git:

     $ git clone git://github.com/waywardgeek/sonic.git

 The source code for the Android version, sonic-ndk, can be cloned with:

     $ git clone git://github.com/waywardgeek/sonic-ndk.git

 There is a simple test app for android that demos capabilities.  You can
 [install the Android application from here](Sonic-NDK.apk)

 There is a new native Java port, which is very fast!  Checkout Sonic.java and
 Main.java in the latest tar-ball, or get the code from git.

 ## Overview

 Sonic is free software for speeding up or slowing down speech.  While similar to
 other algorithms that came before, Sonic is optimized for speed ups of over 2X.
 There is a simple sonic library in ANSI C, and one in pure Java.  Both are
 designed to easily be integrated into streaming voice applications, like TTS
 back ends.  While a very new project, it is already integrated into:

 - espeak
 - Debian Sid as package libsonic
 - Android Astro Player Nova
 - Android Osplayer
 - Multiple closed source TTS engines

 The primary motivation behind sonic is to enable the blind and visually impaired
 to improve their productivity with free software speech engines, like espeak.
 Sonic can also be used by the sighted.  For example, sonic can improve the
 experience of listening to an audio book on an Android phone.

 Sonic is Copyright 2010, 2011, Bill Cox, all rights reserved.  It is released
 as under the Apache 2.0 license.  Feel free to contact me at
 <waywardgeek@gmail.com>.  One user was concerned about patents.  I believe the
 sonic algorithms do not violate any patents, as most of it is very old, based
 on [PICOLA](http://keizai.yokkaichi-u.ac.jp/~ikeda/research/picola.html), and
 the new part, for greater than 2X speed up, is clearly a capability most
 developers ignore, and would not bother to patent.

 ## Comparison to Other Solutions

 In short, Sonic is better for speech, while WSOLA is better for music.

 A popular alternative is SoundTouch.  SoundTouch uses WSOLA, an algorithm
 optimized for changing the tempo of music.  No WSOLA based program performs well
 for speech (contrary to the inventor's estimate of WSOLA).  Listen to [this
 soundstretch sample](soundstretch.wav), which uses SoundTouch, and compare
 it to [this sonic sample](sonic.wav).  Both are sped up by 2X.  WSOLA
 introduces unacceptable levels of distortion, making speech impossible to
 understand at high speed (over 2.5X) by blind speed listeners.

 However, there are decent free software algorithms for speeding up speech.  They
 are all in the TD-PSOLA family.  For speech rates below 2X, sonic uses PICOLA,
 which I find to be the best algorithm available.  A slightly buggy
 implementation of PICOLA is available in the spandsp library.  I find the one in
 RockBox quite good, though it's limited to 2X speed up.  So far as I know, only
 sonic is optimized for speed factors needed by the blind, up to 6X.

 Sonic does all of it's CPU intensive work with integer math, and works well on
 ARM CPUs without FPUs.  It supports multiple channels (stereo), and is also able
 to change the pitch of a voice.  It works well in streaming audio applications,
 and can deal with sound streams in 16-bit signed integer, 32-bit floating point,
 or 8-bit unsigned formats.  The source code is in plain ANSI C.  In short, it's
 production ready.

 ## Using libsonic in your program

 Sonic is still a new library, but is in Debian Sid.  It will take a while
 for it to filter out into all the other distros.  For now, feel free to simply
 add sonic.c and sonic.h to your application (or Sonic.java), but consider
 switching to -lsonic once the library is available on your distro.

 The file [main.c](main.c) is the source code for the sonic command-line application.  It
 is meant to be useful as example code.  Feel free to copy directly from main.c
 into your application, as main.c is in the public domain.  Dependencies listed
 in debian/control like libsndfile are there to compile the sonic command-line
 application.  Libsonic has no external dependencies.

 There are basically two ways to use sonic: batch or stream mode.  The simplest
 is batch mode where you pass an entire sound sample to sonic.  All you do is
 call one function, like this:

     sonicChangeShortSpeed(samples, numSamples, speed, pitch, rate, volume, useChordPitch, sampleRate, numChannels);

 This will change the speed and pitch of the sound samples pointed to by samples,
 which should be 16-bit signed integers.  Stereo mode is supported, as
 is any arbitrary number of channels.  Samples for each channel should be
 adjacent in the input array.  Because the samples are modified in-place, be sure
 that there is room in the samples array for the speed-changed samples.  In
 general, if you are speeding up, rather than slowing down, it will be safe to
 have no extra padding.  If your sound samples are mono, and you don't want to
 scale volume or playback rate, and if you want normal pitch scaling, then call
 it like this:

     sonicChangeShortSpeed(samples, numSamples, speed, pitch, 1.0f, 1.0f, 0, sampleRate, 1);

 The other way to use libsonic is in stream mode.  This is more complex, but
 allows sonic to be inserted into a sound stream with fairly low latency.  The
 current maximum latency in sonic is 31 milliseconds, which is enough to process
 two pitch periods of voice as low as 65 Hz.  In general, the latency is equal to
 two pitch periods, which is typically closer to 20 milliseconds.

 To process a sound stream, you must create a sonicStream object, which contains
 all of the state used by sonic.  Sonic should be thread safe, and multiple
 sonicStream objects can be used at the same time.  You create a sonicStream
 object like this:

     sonicStream stream = sonicCreateStream(sampleRate, numChannels);

 When you're done with a sonic stream, you can free it's memory with:

     sonicDestroyStream(stream);

 By default, a sonic stream sets the speed, pitch, rate, and volume to 1.0, which means
 no change at all to the sound stream.  Sonic detects this case, and simply
 copies the input to the output to reduce CPU load.  To change the speed, pitch,
 rate, or volume, set the parameters using:

     sonicSetSpeed(stream, speed);
     sonicSetPitch(stream, pitch);
     sonicSetRate(stream, rate);
     sonicSetVolume(stream, volume);

 These four parameters are floating point numbers.  A speed of 2.0 means to
 double speed of speech.  A pitch of 0.95 means to lower the pitch by about 5%,
 and a volume of 1.4 means to multiply the sound samples by 1.4, clipping if we
 exceed the maximum range of a 16-bit integer.  Speech rate scales how fast
 speech is played.  A 2.0 value will make you sound like a chipmunk talking very
 fast.  A 0.7 value will make you sound like a giant talking slowly.

 By default, pitch is modified by changing the rate, and then using speed
 modification to bring the speed back to normal.  This allows for a wide range of
 pitch changes, but changing the pitch makes the speaker sound larger or smaller,
 too.  If you want to make the person sound like the same person, but talking at
 a higher or lower pitch, then enable the vocal chord emulation mode for pitch
 scaling, using:

     sonicSetChordPitch(stream, 1);

 However, only small changes to pitch should be used in this mode, as it
 introduces significant distortion otherwise.

 After setting the sound parameters, you write to the stream like this:

     sonicWriteShortToStream(stream, samples, numSamples);

 You read the sped up speech samples from sonic like this:

     samplesRead = sonicReadShortFromStream(stream, outBuffer, maxBufferSize);
     if(samplesRead > 0) {
 	/* Do something with the output samples in outBuffer, like send them to
 	 * the sound device. */
     }

 You may change the speed, pitch, rate, and volume parameters at any time, without
 having to flush or create a new sonic stream.

 When your sound stream ends, there may be several milliseconds of sound data in
 the sonic stream's buffers.  To force sonic to process those samples use:

     sonicFlushStream(stream);

 Then, read those samples as above.  That's about all there is to using libsonic.
 There are some more functions as a convenience for the user, like
 sonicGetSpeed.  Other sound data formats are supported: signed char and float.
 If float, the sound data should be between -1.0 and 1.0.  Internally, all sound
 data is converted to 16-bit integers for processing.
	# libsonic Home Page

	[Download the latest tar-ball from here](download).

	The source code repository can be cloned using git:

	$ git clone git://github.com/waywardgeek/sonic.git

	The source code for the Android version, sonic-ndk, can be cloned with:

	$ git clone git://github.com/waywardgeek/sonic-ndk.git

	There is a simple test app for android that demos capabilities. You can
	[install the Android application from here](Sonic-NDK.apk)

	There is a new native Java port, which is very fast! Checkout Sonic.java and
	Main.java in the latest tar-ball, or get the code from git.

	## Overview

	Sonic is free software for speeding up or slowing down speech. While similar to
	other algorithms that came before, Sonic is optimized for speed ups of over 2X.
	There is a simple sonic library in ANSI C, and one in pure Java. Both are
	designed to easily be integrated into streaming voice applications, like TTS
	back ends. While a very new project, it is already integrated into:

	- espeak
	- Debian Sid as package libsonic
	- Android Astro Player Nova
	- Android Osplayer
	- Multiple closed source TTS engines

	The primary motivation behind sonic is to enable the blind and visually impaired
	to improve their productivity with free software speech engines, like espeak.
	Sonic can also be used by the sighted. For example, sonic can improve the
	experience of listening to an audio book on an Android phone.

	Sonic is Copyright 2010, 2011, Bill Cox, all rights reserved. It is released
	as under the Apache 2.0 license. Feel free to contact me at
	<waywardgeek@gmail.com>. One user was concerned about patents. I believe the
	sonic algorithms do not violate any patents, as most of it is very old, based
	on [PICOLA](http://keizai.yokkaichi-u.ac.jp/~ikeda/research/picola.html), and
	the new part, for greater than 2X speed up, is clearly a capability most
	developers ignore, and would not bother to patent.

	## Comparison to Other Solutions

	In short, Sonic is better for speech, while WSOLA is better for music.

	A popular alternative is SoundTouch. SoundTouch uses WSOLA, an algorithm
	optimized for changing the tempo of music. No WSOLA based program performs well
	for speech (contrary to the inventor's estimate of WSOLA). Listen to [this
	soundstretch sample](soundstretch.wav), which uses SoundTouch, and compare
	it to [this sonic sample](sonic.wav). Both are sped up by 2X. WSOLA
	introduces unacceptable levels of distortion, making speech impossible to
	understand at high speed (over 2.5X) by blind speed listeners.

	However, there are decent free software algorithms for speeding up speech. They
	are all in the TD-PSOLA family. For speech rates below 2X, sonic uses PICOLA,
	which I find to be the best algorithm available. A slightly buggy
	implementation of PICOLA is available in the spandsp library. I find the one in
	RockBox quite good, though it's limited to 2X speed up. So far as I know, only
	sonic is optimized for speed factors needed by the blind, up to 6X.

	Sonic does all of it's CPU intensive work with integer math, and works well on
	ARM CPUs without FPUs. It supports multiple channels (stereo), and is also able
	to change the pitch of a voice. It works well in streaming audio applications,
	and can deal with sound streams in 16-bit signed integer, 32-bit floating point,
	or 8-bit unsigned formats. The source code is in plain ANSI C. In short, it's
	production ready.

	## Using libsonic in your program

	Sonic is still a new library, but is in Debian Sid. It will take a while
	for it to filter out into all the other distros. For now, feel free to simply
	add sonic.c and sonic.h to your application (or Sonic.java), but consider
	switching to -lsonic once the library is available on your distro.

	The file [main.c](main.c) is the source code for the sonic command-line application. It
	is meant to be useful as example code. Feel free to copy directly from main.c
	into your application, as main.c is in the public domain. Dependencies listed
	in debian/control like libsndfile are there to compile the sonic command-line
	application. Libsonic has no external dependencies.

	There are basically two ways to use sonic: batch or stream mode. The simplest
	is batch mode where you pass an entire sound sample to sonic. All you do is
	call one function, like this:

	sonicChangeShortSpeed(samples, numSamples, speed, pitch, rate, volume, useChordPitch, sampleRate, numChannels);

	This will change the speed and pitch of the sound samples pointed to by samples,
	which should be 16-bit signed integers. Stereo mode is supported, as
	is any arbitrary number of channels. Samples for each channel should be
	adjacent in the input array. Because the samples are modified in-place, be sure
	that there is room in the samples array for the speed-changed samples. In
	general, if you are speeding up, rather than slowing down, it will be safe to
	have no extra padding. If your sound samples are mono, and you don't want to
	scale volume or playback rate, and if you want normal pitch scaling, then call
	it like this:

	sonicChangeShortSpeed(samples, numSamples, speed, pitch, 1.0f, 1.0f, 0, sampleRate, 1);

	The other way to use libsonic is in stream mode. This is more complex, but
	allows sonic to be inserted into a sound stream with fairly low latency. The
	current maximum latency in sonic is 31 milliseconds, which is enough to process
	two pitch periods of voice as low as 65 Hz. In general, the latency is equal to
	two pitch periods, which is typically closer to 20 milliseconds.

	To process a sound stream, you must create a sonicStream object, which contains
	all of the state used by sonic. Sonic should be thread safe, and multiple
	sonicStream objects can be used at the same time. You create a sonicStream
	object like this:

	sonicStream stream = sonicCreateStream(sampleRate, numChannels);

	When you're done with a sonic stream, you can free it's memory with:

	sonicDestroyStream(stream);

	By default, a sonic stream sets the speed, pitch, rate, and volume to 1.0, which means
	no change at all to the sound stream. Sonic detects this case, and simply
	copies the input to the output to reduce CPU load. To change the speed, pitch,
	rate, or volume, set the parameters using:

	sonicSetSpeed(stream, speed);
	sonicSetPitch(stream, pitch);
	sonicSetRate(stream, rate);
	sonicSetVolume(stream, volume);

	These four parameters are floating point numbers. A speed of 2.0 means to
	double speed of speech. A pitch of 0.95 means to lower the pitch by about 5%,
	and a volume of 1.4 means to multiply the sound samples by 1.4, clipping if we
	exceed the maximum range of a 16-bit integer. Speech rate scales how fast
	speech is played. A 2.0 value will make you sound like a chipmunk talking very
	fast. A 0.7 value will make you sound like a giant talking slowly.

	By default, pitch is modified by changing the rate, and then using speed
	modification to bring the speed back to normal. This allows for a wide range of
	pitch changes, but changing the pitch makes the speaker sound larger or smaller,
	too. If you want to make the person sound like the same person, but talking at
	a higher or lower pitch, then enable the vocal chord emulation mode for pitch
	scaling, using:

	sonicSetChordPitch(stream, 1);

	However, only small changes to pitch should be used in this mode, as it
	introduces significant distortion otherwise.

	After setting the sound parameters, you write to the stream like this:

	sonicWriteShortToStream(stream, samples, numSamples);

	You read the sped up speech samples from sonic like this:

	samplesRead = sonicReadShortFromStream(stream, outBuffer, maxBufferSize);
	if(samplesRead > 0) {
	/* Do something with the output samples in outBuffer, like send them to
	* the sound device. */
	}

	You may change the speed, pitch, rate, and volume parameters at any time, without
	having to flush or create a new sonic stream.

	When your sound stream ends, there may be several milliseconds of sound data in
	the sonic stream's buffers. To force sonic to process those samples use:

	sonicFlushStream(stream);

	Then, read those samples as above. That's about all there is to using libsonic.
	There are some more functions as a convenience for the user, like
	sonicGetSpeed. Other sound data formats are supported: signed char and float.
	If float, the sound data should be between -1.0 and 1.0. Internally, all sound
	data is converted to 16-bit integers for processing.