The Sound class
- class audiomath.Sound(source=None, fs=None, nChannels=2, dtype=None, bits=None, label=None)
Bases:
objectSoundis a class for editing and writing sound files. (If your main aim is simply to play back existing sounds loaded from a file, you probably want to start with aPlayerinstead.)A
Soundinstancesis a wrapper around a numpy arrays.ywhich contains a floating-point representation of a (possibly multi-channel) sound waveform. Generally the array values are in the range [-1, +1].The wrapper makes it easy to perform certain common editing and preprocessing operations using Python operators, such as:
Numerical operations:
The
+and-operators can be used to superimpose sounds (even if lengths do not match).The
*and/operators can be used to scale amplitudes (usually by a scalar numeric factor, but you can also use a list of scaling factors to scale channels separately, or window a signal by multiplying two objects together).The
+=,-=,*=and/=operators work as you might expect, modifying aSoundinstance’s data array in-place.
Concatenation of sound data in time:
The syntax
s1 % s2is the same asConcatenate( s1, s2 ): it returns a newSoundinstance containing a new array of samples, in which the samples ofs1ands2are concatenated in time.Either argument may be a scalar, so
s % 0.4returns a new object with 400 msec of silence appended, and0.4 % sreturns a new object with 400 msec of silence pre-pended.Concatenation can be performed in-place with
s %= argor equivalently using the instance methods.Concatenate( arg1, arg2, ... ): in either case the instancesgets its internal sample array replaced by a new array.Creating multichannel objects:
The syntax
s1 & s2is the same asStack( s1, s2 ): it returns a newSoundinstance containing a new array of samples, comprising the channels ofs1and the channels ofs2. Either one may be automatically padded with silence at the end as necessary to ensure that the lengths match.Stacking may be performed in-place with
s1 &= s2or equivalently with the instance methods1.Stack( s2, s3, ... ): in either case instances1gets its internal sample array replaced by a new array.Slicing, expressed in units of seconds:
The following syntax returns Sound objects wrapped around slices into the original array:
s[:0.5] # returns the first half-second of `s` s[-0.5:] # returns the last half-second of `s` s[:, 0] # returns the first channel of `s` s[:, -2:] # returns the last two channels of `s` s[0.25:0.5, [0,1,3]] # returns a particular time-slice of the chosen channels
Where possible, the resulting
Soundinstances’ arrays are views into the original sound data. Therefore, things likes[2.0:-1.0].AutoScale()ors[1.0:2.0] *= 2will change the specified segments of the original sound data ins. Note one subtlety, however:# Does each of these examples modify the selected segment of `s` in-place? s[0.1:0.2, :] *= 2 # yes q = s[0.1:0.2, :]; q *= 2 # yes (`q.y` is a view into `s.y`) s[0.1:0.2, ::2] *= 2 # yes q = s[0.1:0.2, ::2]; q *= 2 # yes (`q.y` is a view into `s.y`) s[0.1:0.2, 0] *= 2 # yes (creates a copy, but then uses `__setitem__` on the original) q = s[0.1:0.2, 0]; q *= 2 # - NO (creates a copy, then just modifies the copy in-place) s[0.1:0.2, [1,3]] *= 2 # yes (creates a copy, but then uses `__setitem__` on the original) q = s[0.1:0.2, [1,3]]; q *= 2 # - NO (creates a copy, then just modifies the copy in-place)
A
Soundinstance may be constructed in any of the following ways:s = Sound( '/path/to/some/sound_file.wav' ) s = Sound( some_other_Sound_instance ) # creates a shallow copy s = Sound( y, fs ) # where `y` is a numpy array s = Sound( duration_in_seconds, fs ) # creates silence s = Sound( raw_bytes, fs, nChannels=2 )
- Parameters:
source – a filename, another
Soundinstance, anumpyarray, a scalar numeric value indicating the desired duration of silence in seconds, or a buffer full of raw sound data, as in the examples above.fs (float) – sampling frequency, in Hz. If
sourceis a filename or anotherSoundinstance, the default value will be inferred from that source. Otherwise the default value is 44100.nChannels (int) – number of channels. If
sourceis a filename, anotherSoundinstance, or anumpyarray, the default value will be inferred from that source. Otherwise the default value is 2.dtype (str) – Sound data are always represented internally in floating-point. However, the
dtypeargument specifies theSoundinstance’sdtype_encodedproperty, which dictates the format in which the instance imports or exports raw data by default.bits (int) – This is another way of initializing the instance’s
dtype_encodedproperty (seedtype, above), assuming integer encoding. It should be 8, 16, 24 or 32. Ifdtypeis specified, this argument is ignored. Otherwise, the default value is 16.label (str) – An optional string to be assigned to the
labelattribute of theSoundinstance.
- classmethod AddMethod(func)
Use this as a function decorator, to add a function as a new class method.
- Amplitude(norm=2, threshold=0.0)
Returns a 1-D
numpyarray containing the estimated amplitude of each channel. Withnorm=2, that’s the root-mean-square amplitude, whereasnorm=1would get you the mean-absolute amplitude.Note that, depending on the content, the mean may reflect not only how loud the content is when it happens, but also how often it happens. For example, loud speech with a lot of pauses might have a lower RMS than continuous quiet speech. This would make it difficult to equalize the volumes of the two speech signals. To work around this, use the
thresholdargument. It acts as a noise-gate: in each channel, the average will only include samples whose absolute value reaches or exceedsthresholdtimes that channel’s maximum (so thethresholdvalue itself is relative, expressed as a proportion of each channel’s maximum—this makes the noise-gate invariant to simple rescaling).
- ApplyWindow(func=<function Hann>, axis=0, **kwargs)
If
sis anumpy.ndarray, return a windowed copy of the array. Ifsis anaudiomath.Soundobject (for example, if this is being used a method of that class), then its internal arrays.ywill be replaced by a windowed copy.Windowing means multiplication by the specified window function, along the specified time
axis.funcshould take a single positional argument: length in samples. Additional**kwargs, if any, are passed through. Suitable examples includenumpy.blackman,numpy.kaiser, and friends.
- AutoScale(max_abs_amp=0.95, center='median')
Remove the DC offset from each channel (see
Center()) and then rescale the waveform so that its maximum absolute value (across all channels) ismax_abs_amp.Valid
centeroptions include'median','mean','none', or numeric values (scalar, or one per channel).The array
self.yis modified in place.
- Bulk(encoded=False)
Return the number of bytes occupied by the sound waveform…
encoded=False:…currently, in memory.
encoded=True:…if it were to be encoded according to the currently-specified
dtype_encodedand written to disk in an uncompressed format (excluding the bytes required to store any format header).
- Cat(*args)
Concatenate the instance, in time, with the specified arguments (which may be other
Soundinstances and/or numpy arrays and/or numeric scalars indicating durations of silent intervals in seconds). Replace the sample arrayself.ywith the result. Similar results can be obtained with the%=operator or the global functionConcatenate().
- Center(center='median')
Remove the DC offset from each channel by subtracting its center value.
Valid
centeroptions include'median','mean','none', or numeric values (scalar, or one per channel).The array
self.yis modified in place.
- Concatenate(*args)
Concatenate the instance, in time, with the specified arguments (which may be other
Soundinstances and/or numpy arrays and/or numeric scalars indicating durations of silent intervals in seconds). Replace the sample arrayself.ywith the result. Similar results can be obtained with the%=operator or the global functionConcatenate().
- Copy(empty=False)
Create a new
Soundinstance whose arrayyis a deep copy of the original instance’s array. Withempty=Truethe resulting array will be empty, but will have the same number of channels as the original.Note that most other preprocessing methods return
selfas their output argument. This makes it easy to choose between modifying an instance in- place and creating a modified copy. For example:s.Reverse() # reverse `s` in-place t = s.Copy().Reverse() # create a reversed copy of `s`
- Cut(start=None, stop=None, units='seconds')
Shorten the instance’s internal sample array by taking only samples from
starttostop. Either end-point may beNone. Either may be a positive number of seconds (measured from the start) or a negative number of seconds (measured from the end).
- Detect(threshold=0.05, center=True, p=0, eachChannel=False, units='seconds')
This method finds time(s) at which the absolute signal amplitude exceeds the specified
threshold.If
centeris truthy, subtract the median of each channel first.pdenotes the location of interest, as a proportion of the duration of the above-threshold data. So, 0.0 means “the first time the signal exceeds the threshold”, 1.0 means “the last time the signal exceeds the threshold”, and 0.5 means halfway in between those two time points.pmay be a scalar, in which case the method returns one output argument. Alternativelypmay be a sequence, in which case a sequence of output arguments is returned, each one corresponding to an element ofp.If
eachChannelis truthy, each output will itself be a sequence (one element per channel). If it is untruthy, each output will be a scalar (the signal is considered to have exceeded the threshold when any of its channels exceeds the threshold).unitsmay be'seconds','milliseconds'or'samples'and it dictates the units in which the outputs are expressed.
- Duration()
Returns the duration of the sound in seconds.
- Envelope(granularity=0.002, fs=None, includeDC=False, minThickness=0.01, bars=False)
This is both a global function (working on a
numpyarraystogether with a sampling frequencyfs) and aSoundmethod (working on aSoundinstances, in which case no separatefsargument is required).It returns
(timebase, lower, upper), wherelowerandupperare the lower and upper bounds on the signal amplitude. Amplitude os computed in adjacent non-overlapping bins, each bin being of widthgranularity(expressed in seconds).lowerandupperwill be adjusted such that they are always at leastminThicknessapart. Also, if you supply per-channel values asincludeDC, then the bounds will be adjusted such that the specified values are always included. (If you simply specifyincludeDC=True, then the per-channel median values ofswill be used in this way.)The return values are suitable for plotting as follows:
s = TestSound('12') t, lower, upper = s.Envelope() import matplotlib.pyplot as plt for iChannel in range(s.nChannels): plt.fill_between(t, lower[:, i], upper[:, i])
(NB: You do not actually need to plot it by hand in this way, because the
Sound.Plotmethod will do it for you: “envelope mode” is used by default for sounds longer than 60 seconds, and this decision can be overridden in either direction by passingenvelope=Trueorenvelope=False.)By default,
tis strictly increasing. But if you setbars=Truethen values will be repeated int,loweranduppersuch that only horizontal and vertical edges will appear in the plots (Manhattan skyline style).
- Fade(risetime=0, falltime=0, hann=False)
If
risetimeis greater than zero, it denotes the duration (in seconds) over which the sound is to be faded-in at the beginning.If
falltimeis greater than zero, it denotes the duration of the corresponding fade-out at the end.If
hannis true, then a raised-cosine function is used for fading instead of a linear ramp.The array
self.yis modified in-place.
- GenerateWaveform(freq_hz=1.0, phase_rad=None, phase_deg=None, amplitude=1.0, dc=0.0, samplingfreq_hz=None, duration_msec=None, duration_samples=None, axis=None, waveform=<ufunc 'cos'>, waveform_domain='auto', **kwargs)
Create a signal (or multiple signals, if the input arguments are arrays) which is a function of time (time being defined along the specified
axis).If this is being used as a method of an
audiomath.Soundinstance, then thecontainerargument is automatically set to that instance. Otherwise (if used as a global function), thecontainerargument is optional—if supplied, it should be aaudiomath.Soundobject. With acontainer, theaxisargument is set to 0, and the container object’s sampling frequency number of channels and duration (if non-zero) are used as fallback values in case these are not specified elsewhere. The resulting signal is put intocontainer.yand a reference to thecontaineris returned.Default phase is 0, but may be changed by either
phase_degorphase_rad(or both, as long as the values are consistent).Default duration is 1000 msec, but may be changed by either
duration_samplesorduration_msec(or both, as long as the values are consistent).If
duration_samplesis specified andsamplingfreq_hzis not, then the sampling frequency is chosen such that the duration is 1 second—so thenfreq_hzcan be interpreted as cycles per signal.The default
waveformfunction isnumpy.coswhich means that amplitude, phase and frequency arguments can be taken straight from the kind of dictionary returned byfft2ap()for an accurate reconstruction. Awaveformfunction is assumed by default to take an input expressed in radians, unless the first argument in its signature is namedcycles,samples,secondsormilliseconds, in which case the input argument is adjusted accordingly to achieve the named units. (To specify the units explicitly as one of these options, pass one of these words as thewaveform_domainargument.)In this module,
SineWave(),SquareWave(),TriangleWave()andSawtoothWave()are all functions of cycles (i.e. the product of time and frequency), whereasClick()is a function of milliseconds. Any of these can be passed as thewaveformargument.
- IsolateChannels(ind, *moreIndices)
Select particular channels, discarding the others. The following are all equivalent, for selecting the first two channels:
t = s.IsolateChannels(0, 1) # ordinary 0-based indexing t = s.IsolateChannels([0, 1]) t = s.IsolateChannels('1', '2') # when you use strings, you t = s.IsolateChannels(['1', '2']) # can index the channels t = s.IsolateChannels('12') # using 1-based numbering
Equivalently, you can also use slicing notation, selecting channels via the second dimension:
t = s[:, [0, 1]] t = s[:, '12'] # again, strings mean 1-based indexing
- MakeHannWindow(plateau_duration=0)
Return a single-channel
Soundobject of the same duration and sampling frequency asself, containing a Hann or Tukey window—i.e. a raised-cosine “fade-in”, followed by an optional plateau, followed by a raised- cosine “fade-out”.
- MixDownToMono()
Average the sound waveforms across all channels, replacing
self.ywith the single-channel result.
- ModulateAmplitude(freq_hz=1.0, phase_rad=None, phase_deg=None, amplitude=0.5, dc=0.5, samplingfreq_hz=None, duration_msec=None, duration_samples=None, axis=None, waveform=<ufunc 'sin'>, **kwargs)
If
sis anumpy.ndarray, return a modulated copy of the array. Ifsis anaudiomath.Soundobject (for example, if this is being used a method of that class), then its internal arrays.ywill be replaced by a modulated copy.Modulation means multiplication by the specified
waveform, along the specified timeaxis.Default phase is such that amplitude is 0 at time 0, which corresponds to phase_deg=-90 if
waveformfollows sine phase (remember: by default the modulator is a raised waveform, becausedc=0.5by default). To change phase, specify eitherphase_radorphase_deg.Uses
GenerateWaveform()
- NumberOfChannels()
Returns the number of channels.
- NumberOfSamples()
Returns the length of the sound in samples.
- PadEndTo(seconds)
Append silence to the instance’s internal array as necessary to ensure that the total duration is at least the specified number of
seconds.
- PadStartTo(seconds)
Prepend silence to the instance’s internal array as necessary to ensure that the total duration is at least the specified number of
seconds.
- Play(*pargs, **kwargs)
This quick-and-dirty method allows you to play a
Sound. It creates aPlayerinstance in verbose mode, uses it to play the sound, waits for it to finish (or for the user to press ctrl-C), then destroys thePlayeragain.You will get a better user experience, and better performance, if you explicitly create a
Playerinstance of your own and work with that.Arguments are passed through to the
Player.Play()method.
- Plot(zeroBased=False, maxDuration=None, envelope='auto', title=True, timeShift=0, timeScale=1, hold=False, finish=True, axes=None)
Plot the sound waveform for each channel. The third-party Python package
matplotlibis required to make this work— you may need to install this yourself.- Parameters:
zeroBased (bool) – This determines whether the y-axis labels show zero-based channel numbers (Python’s normal convention) or one-based channel numbers (the convention followed by almost every other audio software tool). In keeping with audiomath’s slicing syntax, zero-based indices are expressed as integers whereas one-based channel indices are expressed as string literals.
maxDuration (float, None) – Long sounds can take a prohibitive amount of time and memory to plot. If
maxDurationis notNone, this method will plot no more than the firstmaxDurationseconds of the sound. If this means some of the sound has been omitted, a warning is printed to the console.envelope (bool, float, ‘auto’) – With
envelope=True, plot the sound in “envelope mode”, which is less veridical when zoomed-in but which uses much less time and memory in the graphics back end. Withenvelope=False, plot the sound waveform as a line. With the default value ofenvelope='auto', only go into envelope mode when plotting more than 60 seconds’ worth of sound. You can also supply a floating-point value, expressed in seconds: this explicitly enforces envelope mode with the specified bin width.title (bool, str) – With
title=True, the instance’slabelattribute is used as the axes title. Withtitle=Falseortitle=None, the axes title is left unchanged. If a string is supplied explicitly, then that string is used as the axes title.timeShift (float) – The x-axis (time) begins at this value, expressed in seconds.
timeScale (float) – After time-shifting, the time axis is multiplied by this number. For example, you can specify
timeScale=1000to visualize your sound on a scale of milliseconds instead of seconds.hold (bool) – With
hold=False, the axes are cleared before plotting. Withhold=True, the plot is superimposed on top of whatever is already plotted in the current axes.
- Read(source, raw_dtype=None, toolkit='auto', verbose=False)
- Parameters:
source – A filename, or a byte string containing raw audio data. With filenames, files are decoded according to their file extension, unless the
raw_dtypeargument is explicitly specified, in which case files are assumed to contain raw data without header, regardless of extension.raw_dtype (str) – If supplied,
sourceis interpreted either as raw audio data, or as the name of a file containing raw audio data without a header. Ifsourceis a byte string containing raw audio data, andraw_dtypeis unspecified,raw_dtypewill default toself.dtype_encoded. Examples might befloat32or evenfloat32*2—the latter explicitly overrides the current value ofself.NumberOfChannels()and interprets the raw data as 2-channel.toolkit (str) – Must be
'auto'(which is the default), or'wave','audioread'or'avbin'. Determines which of the back-ends to use for decoding audio. AVbin binaries for common platforms are included, but the AVbin project is no longer maintained, so support may become sparser/buggier over time. It is recommended that you install the optional third-party packageaudioreadwhich appears to offer better performance (you may also need to install the ffmpeg utility on some platforms/for some file types).verbose (bool) – If set to
True, report toolkit decision-making and decoding progress to stdout.
- Resample(newfs)
Change the instance’s sampling frequency. Replace its internal array with a new array, interpolated at the new sampling frequency
newfs(expressed in Hz).
- Reverse()
Reverse the sound in time. The array
self.yis modified in-place.
- Right()
Return a new
Soundinstance containing a view of alternate channels, starting at the second (unless there is only one channel, in which case return that).
- SamplesToSeconds(samples)
Convert samples to seconds given at the sampling frequency of the instance.
samplesmay be a scalar, or a sequence or array. The result is returned in floating-point.See also:
SecondsToSamples()
- SecondsToSamples(seconds, rounding='round')
Convert seconds to samples given the
Soundinstance’s sampling frequency.secondsmay be a scalar, or a sequence or array. Theroundingapproach may be'floor','round','ceil','none'or'int'. The'int'option rounds in the same way as'floor'but returns integers—all other options return floating-point numbers.See also:
SamplesToSeconds()
- SplitChannels(nChannelsEach=1)
Return a list of new
Soundinstances, each containing a view into the original data, each limited tonChannelsEachconsecutive channels.
- Stack(*args)
Stack the instance, across channels, with the specified arguments (which may be other
Soundinstances and/or numpy arrays). Replace the sample arrayself.ywith the result. Similar results can be obtained with the&=operator or the global functionStack().
- Trim(threshold=0.05, tailoff=0.2, buildup=0)
Remove samples from the beginning and end of a sound, according to amplitude.
The new waveform will start
buildupseconds prior to the first sample on which the absolute amplitude in any channel exceedsthreshold. It will endtailoffseconds after the last sample on whichthresholdis exceeded.
- copy(empty=False)
Create a new
Soundinstance whose arrayyis a deep copy of the original instance’s array. Withempty=Truethe resulting array will be empty, but will have the same number of channels as the original.Note that most other preprocessing methods return
selfas their output argument. This makes it easy to choose between modifying an instance in- place and creating a modified copy. For example:s.Reverse() # reverse `s` in-place t = s.Copy().Reverse() # create a reversed copy of `s`
- dat2str(data=None, dtype=None)
Converts from a
numpy.arrayto a string.datadefaults to the whole ofs.yThe string output contains raw bytes which can be written, for example, to an open audio stream.
- property PitchShift
To use the
PitchShiftmethod of theSoundclass, you must first explicitlyimport audiomath.StretchAndShift.
- property TimeStretch
To use the
TimeStretchmethod of theSoundclass, you must first explicitlyimport audiomath.StretchAndShift.
- property bits
Bit depth of each sample in each channel, when encoded (not necessarily as represented in memory for manipulation and visualization).
- property bytes
Number of bytes used to represent each sample of each channel, when encoded (not necessarily as represented in memory for manipulation and visualization).
- property duration
Returns the duration of the sound in seconds.
- property fs
Sampling frequency, in Hz.
- property nChannels
Returns the number of channels.
- property nSamples
Returns the length of the sound in samples.
- property nbits
Bit depth of each sample in each channel, when encoded (not necessarily as represented in memory for manipulation and visualization).
- property nbytes
Number of bytes used to represent each sample of each channel, when encoded (not necessarily as represented in memory for manipulation and visualization).
- property nchan
Returns the number of channels.
- property nsamp
Returns the length of the sound in samples.
- property numberOfChannels
Returns the number of channels.
- property numberOfSamples
Returns the length of the sound in samples.
- property rms
a 1-D
numpyarray containing root-mean-square amplitude for each channel, i.e. the same asAmplitude(norm=2, threshold=0)
- property y
numpyarray containing the actual sound sample data.