Formant Filter / Vowel circuits or how to do it?

Started by Mr. Lime, October 22, 2019, 10:11:20 AM

Previous topic - Next topic

Digital Larry

#40
Not analog, but I got some interesting results in DSP which I think are two resonant filters in series.  It's been a long time since I did this so I can't remember the details.  These are really resonant SVFs so it should be fairly easily adaptable to analog.

By "easy" of course I mean theoretically possible.  This is probably:

An envelope follower going into two independent gain/offset blocks so that the direction, center, and sweep of filters is independently controllable.  The outputs for each of those goes to the center frequency control of the SVFs.  Q is fixed.  I was really trying to get a wah/anti-wah thing going but when I stumbled on this I laughed so hard that I just left it that way.  I think it gets some pretty good guttural sounds on the low strings.

https://www.soundclick.com/artist/default.cfm?bandID=1373300&content=songs

Check the sound clips for "Tuvan Throat Singer" and "Munchkin Choir".
Digital Larry
Want to quickly design your own effects patches for the Spin FV-1 DSP chip?
https://github.com/HolyCityAudio/SpinCAD-Designer

PRR

#41
> distinguish between processing intended to produce relatively static, or at least adjustable, timbres that are reminiscent of vocal sounds, and processing intended to create the illusion of human speech.

The naked voder made vowels. It took a year of training for "girls" to get good at "speech".

The music-filter analogy is: you can put an "oo" filter in easy, but to "sing" "yes we're going really going going now" is tons more work/practice/control. (Experience your mouth already has, hence talk-boxes.)

> the illusion of human speech.

As opposed to? Unhuman speech? Superhuman speech? The only "hit records" with non-human speech are Collins et al's whale songs (which are Greek to me) and some meaningless oinks in Piggies and a few others.
  • SUPPORTER

Mark Hammer

When I say "illusion of human speech", I mean that it passes a sort of Turing test, and sounds to us like a person speaking, even if it sounds like a poor recording of speech (e.g., the way an old Edison wax cylinder might).  When we hear Vocaloid or Auto-Tune or Stephen Hawking, it sounds synthetically generated to us; more on the machine side of "the uncanny valley" than on the human side.  We can certainly make out the words, which I guess was the principle objective of all those industrial efforts to squeeze most intelligibility out of least bandwidth, but sometimes at the cost of not sounding human.  So, for me at least, the challenge is identifying those aspects that can bring things a little closer to the human side of the uncanny valley.  If it can't be done, I'll accept that.  But until we can't tell if it's live or Memorex, I think we ought to keep trying.  :icon_biggrin:

Sooner Boomer

Quote from: PRR on October 26, 2019, 04:19:03 PM
When a vocoder is complex enough for special uses (Type-n-Talk), there's probably some better way to do it (storing recorded snips in ROM).

Texas Instruments' Speak-and-Spell family used linear predictive coding, stored in ROM to contain phonemes (sounds that make up speech).
Dan of  ̶9̶  only 5 Toes
I'm not getting older, I'm getting "vintage"

ElectricDruid

Quote from: Mark Hammer on October 30, 2019, 09:13:20 AM
When we hear Vocaloid or Auto-Tune or Stephen Hawking, it sounds synthetically generated to us; more on the machine side of "the uncanny valley" than on the human side.

The irony of that is that auto-tune *isn't* synthetically generated - it's actual human speech, processed, more like a vocoder than a speak-n-spell.
This implies there's a quality of genuine speech that can be lost (or removed) from actual speech and leave something that is intelligible but alien. Perhaps if we could figure out what that was, we could add it to other sound and make something that sounds human but unintelligible? I think I've probably heard sounds in that category over the years, and not just travelling in foreign countries!

Mark Hammer

In some ways, Autotune illustrates how complex and multidimensional the "humanification" of synthetic speech is.  What makes Autotune, or at least the way that many musicians have used it, sound nonhuman is the suddenness of the shift in formants and pitch.

Here's an analogy that might be useful to spark thinking: how do dogs recognize something as another animal?  They spend most of their time around people.  So how does a German Shepherd see a dumb little Yorkie or Dachshund and recognize it as another dog or at least another living thing?  There will be traits that are shared by "living things" that are uncharacteristic of objects (and vice versa).  It's partly their physical characteristics, but also the nature of the movement.  Some movement is perceptibly and uniquely "biological".

Digital Larry

Few years back I took an online course in ChucK, which is a somewhat bizarre programming language for sound creation.  One of the assignments was to create a synthetic conversation between two beings, one small and one large.  Here's my version:

https://www.kadenze.com/courses/physics-based-sound-synthesis-for-games-and-interactive-systems-iv/gallery/file-submission-20-6-synthe-tique-dialogue-fantastique-a-couple-o-critters-sittin-around-talkin/gary-worsham-d7cd8750-fed7-423?browsing_scope=course

I recommend this class if you're at all interested in physical modeling synthesis.  It's not analog though so sorry for the OT.

DL
Digital Larry
Want to quickly design your own effects patches for the Spin FV-1 DSP chip?
https://github.com/HolyCityAudio/SpinCAD-Designer

Mark Hammer

Prelinguistic infants (we'll say between 9 and 15 months for arguments' sake) will babble away, practicing the twists and turns of phonemes of the language they're being raised in.  When one ignores the absence of discernible "meaning" to what they're babbling, and simply examines the duration of their utterances, how and where pauses are inserted, and the prosody contained (i.e., how the pitch moves over the course of the utterance), it is largely indistinguishable from the speech of adult talkers.  By 10-12 months, most infants have mastered what sounds like adult speech, even though they don't necessarily have any actual words to plug into it.  I remember well when our eldest was maybe 15-16 months or so, and we were at a Burger King.  He sauntered over to the play area where he spied some kids who must have been around 7-8.  He opened his mouth, said "Hi" and then poured forth with a stream of absolute gibberish that nevertheless had the superficial/structural qualities of speech.  The kids all looked at me and asked "What did he say?", assuming by those structural properties that there was some underlying communicative intent and not simply gibberish.  The sample that Larry linked to provides an excellent illustration of that.  Not a single word in there, but it sure sounds like a conversation.

That's why I suggest that if one wants to mimic a "talking" guitar, it helps to understand what leads us to perceive sounds as more speech-like and not simply mechanical sounds from objects around us.  It doesn't necessarily have to involve sophisticated technology.  For instance, Jeff Beck's use of the humble wah in his rendition of the Willie Dixon tune "I Ain't Superstitious" sounds like a conversation with Rod Stewart.

highwater

Quote from: Mark Hammer on October 31, 2019, 09:04:05 AM
Prelinguistic infants (we'll say between 9 and 15 months for arguments' sake) will babble away, practicing the twists and turns of phonemes of the language they're being raised in.  When one ignores the absence of discernible "meaning" to what they're babbling, and simply examines the duration of their utterances, how and where pauses are inserted, and the prosody contained (i.e., how the pitch moves over the course of the utterance), it is largely indistinguishable from the speech of adult talkers.  By 10-12 months, most infants have mastered what sounds like adult speech, even though they don't necessarily have any actual words to plug into it.  I remember well when our eldest was maybe 15-16 months or so, and we were at a Burger King.  He sauntered over to the play area where he spied some kids who must have been around 7-8.  He opened his mouth, said "Hi" and then poured forth with a stream of absolute gibberish that nevertheless had the superficial/structural qualities of speech.  The kids all looked at me and asked "What did he say?", assuming by those structural properties that there was some underlying communicative intent and not simply gibberish.  The sample that Larry linked to provides an excellent illustration of that.  Not a single word in there, but it sure sounds like a conversation.

Along the same lines:


Similarly: "Wenn ist das Nunstück git und Slotermeyer? Ja! Beiherhund das Oder die Flipperwaldt gersput!"
"I had an unfortunate combination of a very high-end medium-size system, with a "low price" phono preamp (external; this was the decade when phono was obsolete)."
- PRR