Formant Filter / Vowel circuits or how to do it?

Started by Mr. Lime, October 22, 2019, 10:11:20 AM

Previous topic - Next topic

Mr. Lime

Thinking formant filters there aren't lots of analog circuits around it seems.
Sure there are anti wah pedals and kind of talking pedals but none of them really adress a wide range of different frequency shifts.

R.G.'s Walking Sing- Wah got some clever tricks but the controls seem to be very limited.
I would love to have "pig squeals" like they can be heard on gore grind accessable by a guitar effect.

What's the best/simplest approach to archive formant filters, any guesses?

First I thought about parallel phasers with 6 stages to have 3 shifting notches and a phase inverted LFO but the tone control is limited to the resonance control.

Second I thought about parallel state variable filters where we have much better frequency control over the shifting. Again LFO sweeping anti parallel. Parasit Studio Sentient Machine for example.

But that's not nearly satisfying as there's only one notch, so there might be more than one SVF in series.
I think about something like parametric EQs, at least two in series with again two in parallel. Frequency pots are controlled by vactrols like a common Mutron Filter but with antiparallel LFO sweeping or even two LFOs.

BYOC's Parametric EQ looks a little like this:
http://byocelectronics.com/parametricEQschem.pdf

Am I barking up the right tree or what are your guys suggestions on this topic?

Thanks for help

Mark Hammer

You should read up a little on speech-production.  I don't mean that as any admonishment, but rather it may provide some ideas.

Spectrograms of speech sounds show formants as essentially bursts of acoustic energy within a given band, and moving in a given way.  The "movement" is both in terms of the amplitude envelope and where the energy-band is situated.

As dual-band "anti-wahs" illustrate, formants don't all move in the same direction at the same time.

Kipper4

Ma throats as dry as an overcooked kipper.


Smoke me a Kipper. I'll be back for breakfast.

Grey Paper.
http://www.aronnelson.com/DIYFiles/up/

Mr. Lime

#3
Thanks for the quick response, Mark.


I see your point concerning the movement and I understand that this may be the tricky part of an analog circuit that should behave like a studio plugin.
On the other side it might not be necessary to have a two-dimensional movement. When I watch this video for example, lots of sounds can be archived just by sweeping peaks from left to right while cutoff frequency and resonance give tone controls.
Not sure and I might be wrong but I would rather focus on having different peaks from bandpass and notch amplitudes?


Starts at ~ 1:50:


Taking a closer look at the Sing-Wah, R.G. mentioned that parametric controls could (theoretically) replace the inductors.
So we come closer having controls for frequency, boost/cut and resonance altought he found the simulations not to be too promising..
On the other hand, there are problems with cap switching and inductors as well.


Here's the full article:
http://www.geofex.com/article_folders/sing-wah/sing-wah.htm
Thanks for help

Mark Hammer

Vocoders attempt to mimic voice sounds by simply modulating the amplitude of fixed bands.  So if one's voice has peaks at, say, 350, 540, and 900hz, those bands would be emphasized in the instrument signal, to the neglect of all other frequencies.  Because vocoders simply track the relative energy of the various individual bands, and not so much the movement of speech energy across the spectrum, more realistic synthesis of speech sounds relies on having more bands for detection/modulation.  For instance, the old PAiA/Craig Anderton Vocoder only used 8 bands,  It was okay, but not strikingly voice-like.  More realistic voice qualities would need 12 or more bands.

You will note in the first plug-in shown on a scope-pic, that not only do the peaks move around, but their width changes too.  Interestingly, that seems to be a property of some wahs.  They are not simply "bandpass filters" providing the same selectivity no matter where the sweep is situated at the moment, but tend to change Q with sweep.  That may well be responsible for their vocal-like qualities, insomuch as the frequency content is modified like formants are.

Transmogrifox

Quote from: Mark Hammer on October 22, 2019, 04:02:26 PM
... not only do the peaks move around, but their width changes too.  Interestingly, that seems to be a property of some wahs. 

Yes, and I would hypothesize this property is one of the key differences between "vocal" sounding wahs and wahs that some think sterile.

You might need to consider using a microcontroller to make the controls more flexible.

Check out Jurgen Haible stuff. He was always doing crazy stuff like this and has some good filter circuits that may be worth looking at.
trans·mog·ri·fy
tr.v. trans·mog·ri·fied, trans·mog·ri·fy·ing, trans·mog·ri·fies To change into a different shape or form, especially one that is fantastic or bizarre.

PRR

#6
Modern hearing-aids which attempt to emphasize speech over noise have grown to "128 channels", in an attempt to handle each formant(*) individually.

Also because DSP channels have got really cheap, and to offer a higher-price model "better" than the 16-channel jobs. OTOH there is much evidence that until computers and their trainers get smarter, a 3-channel aid does very well against many-channel processing.

(* Dang, another word I have to teach my spell-chucker.)
  • SUPPORTER

Kipper4

An unverified schematic from last year.
I dont think I ever breadboarded it.



Ma throats as dry as an overcooked kipper.


Smoke me a Kipper. I'll be back for breakfast.

Grey Paper.
http://www.aronnelson.com/DIYFiles/up/

Kipper4

And another I feel sure I tested this though cant remember if I liked it.



Ma throats as dry as an overcooked kipper.


Smoke me a Kipper. I'll be back for breakfast.

Grey Paper.
http://www.aronnelson.com/DIYFiles/up/

ElectricDruid

#9
I second Mark's suggestion to look into actual speech production and vowel formant frequencies. It's interesting stuff and you'll come out with different ideas.

I did my own research into this some time ago. I was trying to write a digital "vowel oscillator" algorithm. The idea was that it would have two controls, one for frequency, and another for vowel shape. I hoped to get something that could "sing" vowel sounds.

I discovered that vowels are distinguished by the number and frequency of their peaks. These vary slightly between speakers, and between men and women and children (who have voices pitched differently) but that the overall pattern is similar enough for us to identify the vowel. At a minimum, three peaks are enough to distinguish the vowels, but four or five makes the job much easier.

One key part is that the formants *don't* change much with changing pitch. So if you sing a low "Aaa" and a high "Aaa" the formants are pretty much the same (since they're both you singing it) - after all, that's what identifies it as you singing an A. A side-effect of this is that if the sung pitch goes too high, it can go *above* the range of some important formant peaks. They "fall off the bottom". This is why soprano opera singers are hard to understand. We've not got enough energy in the lower ranges that we need to be able to hear important formants that help us identify which vowel it is. Apparently people who write operas (librettists) are aware of this and either make sure that nothing of vital importance to the plot is sung in the upper registers, or make sure that it's repeated by someone else lower down! So the soprano goes "Oh my god! She's killed him!" and then the tenor goes "No! I don't believe it! She can't have killed him." just to make sure you get it.

Checking my old code, I see a reference in the comments:

Formant data is taken from "Acoustic Characteristics of American English Vowels", Hillenbrand, Getty, Clark, and Wheeler, 1995.

I hope it helps.


PRR

#10
> the formants *don't* change much with changing pitch.

Pitch is in the vocal cords. Formants are the shape/size of the mouth cavity. Different things.

> This is why soprano opera singers are hard to understand.

You don't have to understand the opera soprano. It's an instrument, not a telegram.

> or make sure that it's repeat by someone else lower down! So the soprano goes "Oh my god! She's killed him!" and then the tenor goes "No! I don't believe it! She can't have killed him."

And yeah, that too. Well spotted.

Mezzo: We must be going.
Soprano: Oh yes we're going!
Bari: We are going, oh yes we are going...
Chorus: Going now! Going now!!
Soprano: We really hate to leave you but we really must be going
Bari: Yes we are going, going now
Tenor: Fetch my hat, fetch my gloves (servant enters with hat&gloves)
Sopr: Give our best to all your loves!  (Goodbye hugs all around)
Alto: Going!
Bass: Going!
Sopr: Going Now! Now! Now!
Host: (taps hourglass) Vaya con Dios, dammit, already! Get going NOW!
three Mary Fords: Vaya con Dios
ALL: Going now! Going Now!
Bari: Going!
Alto: ........ Going!
Tenor: .................Going!
Sopr:  ..........................Going!
ALL: Going going going!!!! Now! Now! Now!
Host: (sotto voice) real soon now.
(Orchestral flourish, everybody dances in circles.....) Going now! Going now!
  • SUPPORTER

Rixen

#11
I saw a modulated filter somewhere using PWM to control analogue switches for a switched resistor filter, and it made quite convincing yi-yi-yi-yi-yi sounds from a guitar input. (Hope this makes sense). It may have been on this forum.

EDIT: found it, Parasit studio's Sentient Machine: https://www.parasitstudio.se/sentientmachine.html

Mark Hammer

Quote from: ElectricDruid on October 24, 2019, 01:42:25 PM
I second Mark's suggestion to look into actual speech production and vowel formant frequencies. It's interesting stuff and you'll come out with different ideas.

I did my own research into this some time ago. I was trying to write a digital "vowel oscillator" algorithm. The idea was that it would have two controls, one for frequency, and another for vowel shape. I hoped to get something that could "sing" vowel sounds.

I discovered that vowels are distinguished by the number and frequency of their peaks. These vary slightly between speakers, and between men and women and children (who have voices pitched differently) but that the overall pattern is similar enough for us to identify the vowel. At a minimum, three peaks are enough to distinguish the vowels, but four or five makes the job much easier.

One key part is that the formants *don't* change much with changing pitch. So if you sing a low "Aaa" and a high "Aaa" the formants are pretty much the same (since they're both you singing it) - after all, that's what identifies it as you singing an A. A side-effect of this is that if the sung pitch goes too high, it can go *above* the range of some important formant peaks. They "fall off the bottom". This is why soprano opera singers are hard to understand. We've not got enough energy in the lower ranges that we need to be able to hear important formants that help us identify which vowel it is. Apparently people who write operas (librettists) are aware of this and either make sure that nothing of vital importance to the plot is sung in the upper registers, or make sure that it's repeat by someone else lower down! So the soprano goes "Oh my god! She's killed him!" and then the tenor goes "No! I don't believe it! She can't have killed him." just too make sure you get it.

Checking my old code, I see a reference in the comments:

Formant data is taken from "Acoustic Characteristics of American English Vowels", Hillenbrand, Getty, Clark, and Wheeler, 1995.

I hope it helps.
One needs to keep in mind that vowels are recognizable as such, and "the same", whether spoken by a 5 year-old, a 95 year-old, an Australian, an Italian, a Russian, a man, a woman, a buddhist monk, a person with a raspy voice.  It's not the absolute frequency content, but the relative content, and the pattern, including how the relationships between bursts are maintained.  That's the miracle of speech recognition in humans.  We're attuned to, and track, patterns that let us recognize words in spite of accent, pitch, rasp/fry,and other inconsistencies across human speakers.

Making a guitar mimic speech sounds, especially vowels, is somewhat easier, because there is no accent to contend with, and the vowel to be mimicked is a "pure" sound, rather than a vowel as spoken by a Tuvan throat singer.

Rob Strand

Quote
One needs to keep in mind that vowels are recognizable as such, and "the same", whether spoken by a 5 year-old, a 95 year-old, an Australian, an Italian, a Russian, a man, a woman, a buddhist monk, a person with a raspy voice.
...
That's the miracle of speech recognition in humans.
Indeed.  I remember reading articles on speech recognition saying how AT&T had difficulties rolling out speech recognition due to the different ethnic groups in the US.

Really nice article, written by a well known DSP Guru from AT&T/Bell Labs,
https://www.cs.brandeis.edu/~cs136a/CS136a_docs/Speech_Recognition_History.pdf
Send:     . .- .-. - .... / - --- / --. --- .-. -
According to the water analogy of electricity, transistor leakage is caused by holes.

Mark Hammer

Thanks for that, Rob.
I'll return the favour by noting a decent book on the evolution of synthetic voice in music, wartime communications, etc., including vocoding, talk boxes, and such: "How to Wreck a Nice Beach" ( https://www.researchgate.net/publication/283979041_How_to_Wreck_a_Nice_Beach_The_Vocoder_from_World_War_II_to_Hip-Hop_the_Machine_Speaks_Dave_Tompkins ).  It's a little chaotic, in terms of writing style, but covers a great deal of the last 100 years, and the interconnections between the many ways in which such technology was developed, marketed, and incorporated into popular culture.  A wealth of musical suggestions as well.  A lot of artists I had simply never heard of before.

Rob Strand

QuoteIt's a little chaotic, in terms of writing style, but covers a great deal of the last 100 years, and the interconnections between the many ways in which such technology was developed, marketed, and incorporated into popular culture.  A wealth of musical suggestions as well.
Amazing, the vocoder has its origins in Bell Labs   I suppose if you listened to scrambled speech as part of your job it only takes one bright spark to make use of it elsewhere.   As a child even single side-band CB radio sounded cool to me.
QuoteA lot of artists I had simply never heard of before.
I felt equally out of touch reading the first page and a half.
Send:     . .- .-. - .... / - --- / --. --- .-. -
According to the water analogy of electricity, transistor leakage is caused by holes.

R.G.

Bell Labs built on a body of knowledge from the even earlier work done in making vocal tract mechanical models. I'd have to look up the dates, but in the ?1890s? there were talking machines which produced very reasonable speech when operated by trained women moving levers and such to move the model tracts' parts.

Two formant filters are reputed to be enough for vowel mimicry. Some variants of vowelizers/dipthongizers use a third, but rarely. OTA state variable filters are a nice analog way to mechanize this, but that leaves open the question of how drive the filters to the intended frequencies. I haven't built one of this kind of thing in over a decade. Today, I'd use a uC with a table for the control voltages, PWM to make the control voltages and some kind of "gliding" control to move the sounds smoothly between intended vowel approximations.

You could do the same in a dsp, of course.
R.G.

In response to the questions in the forum - PCB Layout for Musical Effects is available from The Book Patch. Search "PCB Layout" and it ought to appear.

PRR

#17
While you could connect vocoders with scramblers, Bell's interest was probably:

* Helping mutes talk (mirror of helping deaf hear, which Bell worked on).

* Putting coded speech on bad telegraph lines to carry more conver$ations with less copper.

Vocoders never replaced direct speech for general telephony. Unexperienced people do not understand simple vocoders.

A few vocoder mute-aids were prototyped but never caught on. Throat-buzzers (Electrolarynx) can replace lost vocal cords so throat/lips can shape a "speech" which is not pretty but becomes intelligible.

When a vocoder is complex enough for special uses (Type-n-Talk), there's probably some better way to do it (storing recorded snips in ROM).
  • SUPPORTER

Rob Strand

QuoteBell Labs built on a body of knowledge from the even earlier work done in making vocal tract mechanical models. I'd have to look up the dates, but in the ?1890s? there were talking machines which produced very reasonable speech when operated by trained women moving levers and such to move the model tracts' parts.
Page 4 of the article I posted goes back to 1773 and 1791 then some more in mid 1800's but it's not clear at what point it could be considered speech.
Send:     . .- .-. - .... / - --- / --. --- .-. -
According to the water analogy of electricity, transistor leakage is caused by holes.

Mark Hammer

Anyone have any insights into the Korg Miku pedal, and its' approach to mimicking speech?  I'm not aiming to clone a Miku. I'm just in interested in the approach they took to the use of Vocaloid or something like it.  Perhaps simply slowing down a sample would reveal a little.