The Sound
class
- class audiomath.Sound(source=None, fs=None, nChannels=2, dtype=None, bits=None, label=None)
Bases:
object
Sound
is a class for editing and writing sound files. (If your main aim is simply to play back existing sounds loaded from a file, you probably want to start with aPlayer
instead.)A
Sound
instances
is a wrapper around a numpy arrays.y
which contains a floating-point representation of a (possibly multi-channel) sound waveform. Generally the array values are in the range [-1, +1].The wrapper makes it easy to perform certain common editing and preprocessing operations using Python operators, such as:
Numerical operations:
The
+
and-
operators can be used to superimpose sounds (even if lengths do not match).The
*
and/
operators can be used to scale amplitudes (usually by a scalar numeric factor, but you can also use a list of scaling factors to scale channels separately, or window a signal by multiplying two objects together).The
+=
,-=
,*=
and/=
operators work as you might expect, modifying aSound
instance’s data array in-place.
Concatenation of sound data in time:
The syntax
s1 % s2
is the same asConcatenate( s1, s2 )
: it returns a newSound
instance containing a new array of samples, in which the samples ofs1
ands2
are concatenated in time.Either argument may be a scalar, so
s % 0.4
returns a new object with 400 msec of silence appended, and0.4 % s
returns a new object with 400 msec of silence pre-pended.Concatenation can be performed in-place with
s %= arg
or equivalently using the instance methods.Concatenate( arg1, arg2, ... )
: in either case the instances
gets its internal sample array replaced by a new array.Creating multichannel objects:
The syntax
s1 & s2
is the same asStack( s1, s2 )
: it returns a newSound
instance containing a new array of samples, comprising the channels ofs1
and the channels ofs2
. Either one may be automatically padded with silence at the end as necessary to ensure that the lengths match.Stacking may be performed in-place with
s1 &= s2
or equivalently with the instance methods1.Stack( s2, s3, ... )
: in either case instances1
gets its internal sample array replaced by a new array.Slicing, expressed in units of seconds:
The following syntax returns Sound objects wrapped around slices into the original array:
s[:0.5] # returns the first half-second of `s` s[-0.5:] # returns the last half-second of `s` s[:, 0] # returns the first channel of `s` s[:, -2:] # returns the last two channels of `s` s[0.25:0.5, [0,1,3]] # returns a particular time-slice of the chosen channels
Where possible, the resulting
Sound
instances’ arrays are views into the original sound data. Therefore, things likes[2.0:-1.0].AutoScale()
ors[1.0:2.0] *= 2
will change the specified segments of the original sound data ins
. Note one subtlety, however:# Does each of these examples modify the selected segment of `s` in-place? s[0.1:0.2, :] *= 2 # yes q = s[0.1:0.2, :]; q *= 2 # yes (`q.y` is a view into `s.y`) s[0.1:0.2, ::2] *= 2 # yes q = s[0.1:0.2, ::2]; q *= 2 # yes (`q.y` is a view into `s.y`) s[0.1:0.2, 0] *= 2 # yes (creates a copy, but then uses `__setitem__` on the original) q = s[0.1:0.2, 0]; q *= 2 # - NO (creates a copy, then just modifies the copy in-place) s[0.1:0.2, [1,3]] *= 2 # yes (creates a copy, but then uses `__setitem__` on the original) q = s[0.1:0.2, [1,3]]; q *= 2 # - NO (creates a copy, then just modifies the copy in-place)
A
Sound
instance may be constructed in any of the following ways:s = Sound( '/path/to/some/sound_file.wav' ) s = Sound( some_other_Sound_instance ) # creates a shallow copy s = Sound( y, fs ) # where `y` is a numpy array s = Sound( duration_in_seconds, fs ) # creates silence s = Sound( raw_bytes, fs, nChannels=2 )
- Parameters:
source – a filename, another
Sound
instance, anumpy
array, a scalar numeric value indicating the desired duration of silence in seconds, or a buffer full of raw sound data, as in the examples above.fs (float) – sampling frequency, in Hz. If
source
is a filename or anotherSound
instance, the default value will be inferred from that source. Otherwise the default value is 44100.nChannels (int) – number of channels. If
source
is a filename, anotherSound
instance, or anumpy
array, the default value will be inferred from that source. Otherwise the default value is 2.dtype (str) – Sound data are always represented internally in floating-point. However, the
dtype
argument specifies theSound
instance’sdtype_encoded
property, which dictates the format in which the instance imports or exports raw data by default.bits (int) – This is another way of initializing the instance’s
dtype_encoded
property (seedtype
, above), assuming integer encoding. It should be 8, 16, 24 or 32. Ifdtype
is specified, this argument is ignored. Otherwise, the default value is 16.label (str) – An optional string to be assigned to the
label
attribute of theSound
instance.
- classmethod AddMethod(func)
Use this as a function decorator, to add a function as a new class method.
- Amplitude(norm=2, threshold=0.0)
Returns a 1-D
numpy
array containing the estimated amplitude of each channel. Withnorm=2
, that’s the root-mean-square amplitude, whereasnorm=1
would get you the mean-absolute amplitude.Note that, depending on the content, the mean may reflect not only how loud the content is when it happens, but also how often it happens. For example, loud speech with a lot of pauses might have a lower RMS than continuous quiet speech. This would make it difficult to equalize the volumes of the two speech signals. To work around this, use the
threshold
argument. It acts as a noise-gate: in each channel, the average will only include samples whose absolute value reaches or exceedsthreshold
times that channel’s maximum (so thethreshold
value itself is relative, expressed as a proportion of each channel’s maximum—this makes the noise-gate invariant to simple rescaling).
- ApplyWindow(func=<function hanning>, axis=0, **kwargs)
If
s
is anumpy.ndarray
, return a windowed copy of the array. Ifs
is anaudiomath.Sound
object (for example, if this is being used a method of that class), then its internal arrays.y
will be replaced by a windowed copy.Windowing means multiplication by the specified window function, along the specified time
axis
.func
should take a single positional argument: length in samples. Additional**kwargs
, if any, are passed through. Suitable examples includenumpy.blackman
,numpy.kaiser
, and friends.
- AutoScale(max_abs_amp=0.95, center='median')
Remove the DC offset from each channel (see
Center()
) and then rescale the waveform so that its maximum absolute value (across all channels) ismax_abs_amp
.Valid
center
options include'median'
,'mean'
,'none'
, or numeric values (scalar, or one per channel).The array
self.y
is modified in place.
- Bulk(encoded=False)
Return the number of bytes occupied by the sound waveform…
encoded=False
:…currently, in memory.
encoded=True
:…if it were to be encoded according to the currently-specified
dtype_encoded
and written to disk in an uncompressed format (excluding the bytes required to store any format header).
- Cat(*args)
Concatenate the instance, in time, with the specified arguments (which may be other
Sound
instances and/or numpy arrays and/or numeric scalars indicating durations of silent intervals in seconds). Replace the sample arrayself.y
with the result. Similar results can be obtained with the%=
operator or the global functionConcatenate()
.
- Center(center='median')
Remove the DC offset from each channel by subtracting its center value.
Valid
center
options include'median'
,'mean'
,'none'
, or numeric values (scalar, or one per channel).The array
self.y
is modified in place.
- Concatenate(*args)
Concatenate the instance, in time, with the specified arguments (which may be other
Sound
instances and/or numpy arrays and/or numeric scalars indicating durations of silent intervals in seconds). Replace the sample arrayself.y
with the result. Similar results can be obtained with the%=
operator or the global functionConcatenate()
.
- Copy(empty=False)
Create a new
Sound
instance whose arrayy
is a deep copy of the original instance’s array. Withempty=True
the resulting array will be empty, but will have the same number of channels as the original.Note that most other preprocessing methods return
self
as their output argument. This makes it easy to choose between modifying an instance in- place and creating a modified copy. For example:s.Reverse() # reverse `s` in-place t = s.Copy().Reverse() # create a reversed copy of `s`
- Cut(start=None, stop=None, units='seconds')
Shorten the instance’s internal sample array by taking only samples from
start
tostop
. Either end-point may beNone
. Either may be a positive number of seconds (measured from the start) or a negative number of seconds (measured from the end).
- Detect(threshold=0.05, center=True, p=0, eachChannel=False, units='seconds')
This method finds time(s) at which the absolute signal amplitude exceeds the specified
threshold
.If
center
is truthy, subtract the median of each channel first.p
denotes the location of interest, as a proportion of the duration of the above-threshold data. So, 0.0 means “the first time the signal exceeds the threshold”, 1.0 means “the last time the signal exceeds the threshold”, and 0.5 means halfway in between those two time points.p
may be a scalar, in which case the method returns one output argument. Alternativelyp
may be a sequence, in which case a sequence of output arguments is returned, each one corresponding to an element ofp
.If
eachChannel
is truthy, each output will itself be a sequence (one element per channel). If it is untruthy, each output will be a scalar (the signal is considered to have exceeded the threshold when any of its channels exceeds the threshold).units
may be'seconds'
,'milliseconds'
or'samples'
and it dictates the units in which the outputs are expressed.
- Duration()
Returns the duration of the sound in seconds.
- Envelope(granularity=0.002, fs=None, includeDC=False, minThickness=0.01, bars=False)
This is both a global function (working on a
numpy
arrays
together with a sampling frequencyfs
) and aSound
method (working on aSound
instances
, in which case no separatefs
argument is required).It returns
(timebase, lower, upper)
, wherelower
andupper
are the lower and upper bounds on the signal amplitude. Amplitude os computed in adjacent non-overlapping bins, each bin being of widthgranularity
(expressed in seconds).lower
andupper
will be adjusted such that they are always at leastminThickness
apart. Also, if you supply per-channel values asincludeDC
, then the bounds will be adjusted such that the specified values are always included. (If you simply specifyincludeDC=True
, then the per-channel median values ofs
will be used in this way.)The return values are suitable for plotting as follows:
s = TestSound('12') t, lower, upper = s.Envelope() import matplotlib.pyplot as plt for iChannel in range(s.nChannels): plt.fill_between(t, lower[:, i], upper[:, i])
(NB: You do not actually need to plot it by hand in this way, because the
Sound.Plot
method will do it for you: “envelope mode” is used by default for sounds longer than 60 seconds, and this decision can be overridden in either direction by passingenvelope=True
orenvelope=False
.)By default,
t
is strictly increasing. But if you setbars=True
then values will be repeated int
,lower
andupper
such that only horizontal and vertical edges will appear in the plots (Manhattan skyline style).
- Fade(risetime=0, falltime=0, hann=False)
If
risetime
is greater than zero, it denotes the duration (in seconds) over which the sound is to be faded-in at the beginning.If
falltime
is greater than zero, it denotes the duration of the corresponding fade-out at the end.If
hann
is true, then a raised-cosine function is used for fading instead of a linear ramp.The array
self.y
is modified in-place.
- GenerateWaveform(freq_hz=1.0, phase_rad=None, phase_deg=None, amplitude=1.0, dc=0.0, samplingfreq_hz=None, duration_msec=None, duration_samples=None, axis=None, waveform=<ufunc 'cos'>, waveform_domain='auto', **kwargs)
Create a signal (or multiple signals, if the input arguments are arrays) which is a function of time (time being defined along the specified
axis
).If this is being used as a method of an
audiomath.Sound
instance, then thecontainer
argument is automatically set to that instance. Otherwise (if used as a global function), thecontainer
argument is optional—if supplied, it should be aaudiomath.Sound
object. With acontainer
, theaxis
argument is set to 0, and the container object’s sampling frequency number of channels and duration (if non-zero) are used as fallback values in case these are not specified elsewhere. The resulting signal is put intocontainer.y
and a reference to thecontainer
is returned.Default phase is 0, but may be changed by either
phase_deg
orphase_rad
(or both, as long as the values are consistent).Default duration is 1000 msec, but may be changed by either
duration_samples
orduration_msec
(or both, as long as the values are consistent).If
duration_samples
is specified andsamplingfreq_hz
is not, then the sampling frequency is chosen such that the duration is 1 second—so thenfreq_hz
can be interpreted as cycles per signal.The default
waveform
function isnumpy.cos
which means that amplitude, phase and frequency arguments can be taken straight from the kind of dictionary returned byfft2ap()
for an accurate reconstruction. Awaveform
function is assumed by default to take an input expressed in radians, unless the first argument in its signature is namedcycles
,samples
,seconds
ormilliseconds
, in which case the input argument is adjusted accordingly to achieve the named units. (To specify the units explicitly as one of these options, pass one of these words as thewaveform_domain
argument.)In this module,
SineWave()
,SquareWave()
,TriangleWave()
andSawtoothWave()
are all functions of cycles (i.e. the product of time and frequency), whereasClick()
is a function of milliseconds. Any of these can be passed as thewaveform
argument.
- IsolateChannels(ind, *moreIndices)
Select particular channels, discarding the others. The following are all equivalent, for selecting the first two channels:
t = s.IsolateChannels(0, 1) # ordinary 0-based indexing t = s.IsolateChannels([0, 1]) t = s.IsolateChannels('1', '2') # when you use strings, you t = s.IsolateChannels(['1', '2']) # can index the channels t = s.IsolateChannels('12') # using 1-based numbering
Equivalently, you can also use slicing notation, selecting channels via the second dimension:
t = s[:, [0, 1]] t = s[:, '12'] # again, strings mean 1-based indexing
- MakeHannWindow(plateau_duration=0)
Return a single-channel
Sound
object of the same duration and sampling frequency asself
, containing a Hann or Tukey window—i.e. a raised-cosine “fade-in”, followed by an optional plateau, followed by a raised- cosine “fade-out”.
- MixDownToMono()
Average the sound waveforms across all channels, replacing
self.y
with the single-channel result.
- ModulateAmplitude(freq_hz=1.0, phase_rad=None, phase_deg=None, amplitude=0.5, dc=0.5, samplingfreq_hz=None, duration_msec=None, duration_samples=None, axis=None, waveform=<ufunc 'sin'>, **kwargs)
If
s
is anumpy.ndarray
, return a modulated copy of the array. Ifs
is anaudiomath.Sound
object (for example, if this is being used a method of that class), then its internal arrays.y
will be replaced by a modulated copy.Modulation means multiplication by the specified
waveform
, along the specified timeaxis
.Default phase is such that amplitude is 0 at time 0, which corresponds to phase_deg=-90 if
waveform
follows sine phase (remember: by default the modulator is a raised waveform, becausedc=0.5
by default). To change phase, specify eitherphase_rad
orphase_deg
.Uses
GenerateWaveform()
- NumberOfChannels()
Returns the number of channels.
- NumberOfSamples()
Returns the length of the sound in samples.
- PadEndTo(seconds)
Append silence to the instance’s internal array as necessary to ensure that the total duration is at least the specified number of
seconds
.
- PadStartTo(seconds)
Prepend silence to the instance’s internal array as necessary to ensure that the total duration is at least the specified number of
seconds
.
- Play(*pargs, **kwargs)
This quick-and-dirty method allows you to play a
Sound
. It creates aPlayer
instance in verbose mode, uses it to play the sound, waits for it to finish (or for the user to press ctrl-C), then destroys thePlayer
again.You will get a better user experience, and better performance, if you explicitly create a
Player
instance of your own and work with that.Arguments are passed through to the
Player.Play()
method.
- Plot(zeroBased=False, maxDuration=None, envelope='auto', title=True, timeShift=0, timeScale=1, hold=False, finish=True)
Plot the sound waveform for each channel. The third-party Python package
matplotlib
is required to make this work— you may need to install this yourself.- Parameters:
zeroBased (bool) – This determines whether the y-axis labels show zero-based channel numbers (Python’s normal convention) or one-based channel numbers (the convention followed by almost every other audio software tool). In keeping with audiomath’s slicing syntax, zero-based indices are expressed as integers whereas one-based channel indices are expressed as string literals.
maxDuration (float, None) – Long sounds can take a prohibitive amount of time and memory to plot. If
maxDuration
is notNone
, this method will plot no more than the firstmaxDuration
seconds of the sound. If this means some of the sound has been omitted, a warning is printed to the console.envelope (bool, float, ‘auto’) – With
envelope=True
, plot the sound in “envelope mode”, which is less veridical when zoomed-in but which uses much less time and memory in the graphics back end. Withenvelope=False
, plot the sound waveform as a line. With the default value ofenvelope='auto'
, only go into envelope mode when plotting more than 60 seconds’ worth of sound. You can also supply a floating-point value, expressed in seconds: this explicitly enforces envelope mode with the specified bin width.title (bool, str) – With
title=True
, the instance’slabel
attribute is used as the axes title. Withtitle=False
ortitle=None
, the axes title is left unchanged. If a string is supplied explicitly, then that string is used as the axes title.timeShift (float) – The x-axis (time) begins at this value, expressed in seconds.
timeScale (float) – After time-shifting, the time axis is multiplied by this number. For example, you can specify
timeScale=1000
to visualize your sound on a scale of milliseconds instead of seconds.hold (bool) – With
hold=False
, the axes are cleared before plotting. Withhold=True
, the plot is superimposed on top of whatever is already plotted in the current axes.
- Read(source, raw_dtype=None, toolkit='auto', verbose=False)
- Parameters:
source – A filename, or a byte string containing raw audio data. With filenames, files are decoded according to their file extension, unless the
raw_dtype
argument is explicitly specified, in which case files are assumed to contain raw data without header, regardless of extension.raw_dtype (str) – If supplied,
source
is interpreted either as raw audio data, or as the name of a file containing raw audio data without a header. Ifsource
is a byte string containing raw audio data, andraw_dtype
is unspecified,raw_dtype
will default toself.dtype_encoded
. Examples might befloat32
or evenfloat32*2
—the latter explicitly overrides the current value ofself.NumberOfChannels()
and interprets the raw data as 2-channel.toolkit (str) – Must be
'auto'
(which is the default), or'wave'
,'audioread'
or'avbin'
. Determines which of the back-ends to use for decoding audio. AVbin binaries for common platforms are included, but the AVbin project is no longer maintained, so support may become sparser/buggier over time. It is recommended that you install the optional third-party packageaudioread
which appears to offer better performance (you may also need to install the ffmpeg utility on some platforms/for some file types).verbose (bool) – If set to
True
, report toolkit decision-making and decoding progress to stdout.
- Resample(newfs)
Change the instance’s sampling frequency. Replace its internal array with a new array, interpolated at the new sampling frequency
newfs
(expressed in Hz).
- Reverse()
Reverse the sound in time. The array
self.y
is modified in-place.
- Right()
Return a new
Sound
instance containing a view of alternate channels, starting at the second (unless there is only one channel, in which case return that).
- SamplesToSeconds(samples)
Convert samples to seconds given at the sampling frequency of the instance.
samples
may be a scalar, or a sequence or array. The result is returned in floating-point.See also:
SecondsToSamples()
- SecondsToSamples(seconds, rounding='round')
Convert seconds to samples given the
Sound
instance’s sampling frequency.seconds
may be a scalar, or a sequence or array. Therounding
approach may be'floor'
,'round'
,'ceil'
,'none'
or'int'
. The'int'
option rounds in the same way as'floor'
but returns integers—all other options return floating-point numbers.See also:
SamplesToSeconds()
- SplitChannels(nChannelsEach=1)
Return a list of new
Sound
instances, each containing a view into the original data, each limited tonChannelsEach
consecutive channels.
- Stack(*args)
Stack the instance, across channels, with the specified arguments (which may be other
Sound
instances and/or numpy arrays). Replace the sample arrayself.y
with the result. Similar results can be obtained with the&=
operator or the global functionStack()
.
- Trim(threshold=0.05, tailoff=0.2, buildup=0)
Remove samples from the beginning and end of a sound, according to amplitude.
The new waveform will start
buildup
seconds prior to the first sample on which the absolute amplitude in any channel exceedsthreshold
. It will endtailoff
seconds after the last sample on whichthreshold
is exceeded.
- copy(empty=False)
Create a new
Sound
instance whose arrayy
is a deep copy of the original instance’s array. Withempty=True
the resulting array will be empty, but will have the same number of channels as the original.Note that most other preprocessing methods return
self
as their output argument. This makes it easy to choose between modifying an instance in- place and creating a modified copy. For example:s.Reverse() # reverse `s` in-place t = s.Copy().Reverse() # create a reversed copy of `s`
- dat2str(data=None, dtype=None)
Converts from a
numpy.array
to a string.data
defaults to the whole ofs.y
The string output contains raw bytes which can be written, for example, to an open audio stream.
- property PitchShift
To use the
PitchShift
method of theSound
class, you must first explicitlyimport audiomath.StretchAndShift
.
- property TimeStretch
To use the
TimeStretch
method of theSound
class, you must first explicitlyimport audiomath.StretchAndShift
.
- property bits
Bit depth of each sample in each channel, when encoded (not necessarily as represented in memory for manipulation and visualization).
- property bytes
Number of bytes used to represent each sample of each channel, when encoded (not necessarily as represented in memory for manipulation and visualization).
- property duration
Returns the duration of the sound in seconds.
- property fs
Sampling frequency, in Hz.
- property nChannels
Returns the number of channels.
- property nSamples
Returns the length of the sound in samples.
- property nbits
Bit depth of each sample in each channel, when encoded (not necessarily as represented in memory for manipulation and visualization).
- property nbytes
Number of bytes used to represent each sample of each channel, when encoded (not necessarily as represented in memory for manipulation and visualization).
- property nchan
Returns the number of channels.
- property nsamp
Returns the length of the sound in samples.
- property numberOfChannels
Returns the number of channels.
- property numberOfSamples
Returns the length of the sound in samples.
- property rms
a 1-D
numpy
array containing root-mean-square amplitude for each channel, i.e. the same asAmplitude(norm=2, threshold=0)
- property y
numpy
array containing the actual sound sample data.