talkbox without talking possible?

Started by 11-90-an, June 02, 2020, 06:35:50 AM

Previous topic - Next topic

Digital Larry

Anyone ever inhale while playing a jaw harp?  Don't tell me you never inhaled.
Digital Larry
Want to quickly design your own effects patches for the Spin FV-1 DSP chip?
https://github.com/HolyCityAudio/SpinCAD-Designer

ashcat_lt

Sorry, this is months old, but it occurs to that the OP might get close with a vocoder.  That's a pretty complex circuit for a pedal, though it could be done digitally.  Then again, a ring modulator can do things that are surprisingly similar with the right inputs, and we've got a few of those out there.

pinkjimiphoton

Quote from: 11-90-an on June 02, 2020, 09:00:23 AM
Quote from: antonis on June 02, 2020, 08:51:16 AM
What about farting in tune..??

If i can record some farts and play it in this circuit... perhaps.... ;) That would be interesting tone ;) ;) ;)

i gotta fuzzbox that sounds like farts in a blender if ya need the schematic, let me know.... lol
  • SUPPORTER
"When the power of love overcomes the love of power the world will know peace."
Slava Ukraini!
"try whacking the bejesus outta it and see if it works again"....
~Jack Darr

11-90-an

Quote from: pinkjimiphoton on November 13, 2020, 01:29:33 PM
Quote from: 11-90-an on June 02, 2020, 09:00:23 AM
Quote from: antonis on June 02, 2020, 08:51:16 AM
What about farting in tune..??

If i can record some farts and play it in this circuit... perhaps.... ;) That would be interesting tone ;) ;) ;)

i gotta fuzzbox that sounds like farts in a blender if ya need the schematic, let me know.... lol

Ooh yes.. i've been looking for a fuzz to build... Schem? Yessiree would be very apreciated.
flip flop flip flop flip

pinkjimiphoton

here, try this... kinda like pitched static lol

nope. post image is down. try later if i remember
  • SUPPORTER
"When the power of love overcomes the love of power the world will know peace."
Slava Ukraini!
"try whacking the bejesus outta it and see if it works again"....
~Jack Darr

Mark Hammer

Let's take a step or two back. 

There are essentially two basic approaches to mimicking speech sounds with a pedal.  One is aiming for speech-like sounds on demand.  So this would include vocoding, envelope-controlled "diphthongizers" and anti-wah arrangements, treadle-controlled talking pedals, and maybe even the Korg Miku. 

The other approach, which hasn't really been explored in this thread, is random speech-like sounds.  That is, swept filters that are not necessarily corresponding to what the player aims for at this moment, but which often line up in a way that mimics formant combinations humans use when they talk.   In other words, it kinda sounds like talking, though not necessarily the talking you were aiming for.  Essentially what I'm talking about is tweaked bandpass filters (appropriate to 1st, 2nd, and 3rd formants) being swept independently by their own LFOs; the LFOs themselves having a sweep width, and maybe even speed, corresponding to the typical range of the given formant.

pinkjimiphoton

the ludwig phase II, madbean's honeydripper, dino's talkalyzer, ronan's talking pedal clone will all get ya in the ballpark pretty well.

i like the idea of random lfo sweeps of different formants. i think some of the digitech and zoom stuff kinda did that. its vocal-ish, but not anything particularly speech like.

when dino came to visit years ago, i remember him making the dang ludwig seem to say a couple words. ;) it was really cool.

i think if ya follow mark's approach tho, you may get a useable idea. i'd think maybe make the r/c networks where the first three formants occur so that you can tweak the bandwidth or frequency range of each filter would probably be important for best effect.
if you could make a couple filters cross like a bassballs does, and maybe control a third formant with a wah pedal or similar may be an interesting way to get a lot of mileage out of a fairly simple concept.

i built tiny dazzler's formant wah thing years ago, i can't recall the name of it, but it did the formant thing really well... mouthmeister i think he called it.

or maybe it would be better to place the third formant under lfo control with random waveforms like in a sample and hold circuit, and "wah" the other two in a wah/antiwah kinda setup by treadle.

the bitch comes down to which formant is under what kind of control, i'd think. some vocal formants are more speech like than others, some are hardly noticeable.

making popcorn to see who comes up with the next genius idea here. remember, fuzz will make it stand out a bit more. i always found fairly brutal fuzzes like in the ludwig helped generate more consonant noises from your pick bashing against stuff, which is as important <to me, anyways> as the vowel sounds themselves if ya wanna make it sound like its talking.

but imho the easiest way to accomplish this is still with a classic talkbox setup of a full range driver, an a/b box and some tubing.

  • SUPPORTER
"When the power of love overcomes the love of power the world will know peace."
Slava Ukraini!
"try whacking the bejesus outta it and see if it works again"....
~Jack Darr

Mark Hammer

You touch on a reasonable compromise, Sir James.

Consider something like a Bassballs (where the bandpass filters don't cross over, as you imply, but rise and fall together) but with 3 bandpass filters.  The lowest one is foot-controlled, via a simple resistance to ground, while the upper two are LFO swept.

Now that I think of it, I had mentioned using a guitar-mounted-and-pinky-finger-modulated photocell control in past.  A person could work the lowest formant from the guitar, using their pinky finger, while they play.  Once in a while the upper two swept formants would conspire, and line up with the lower formant to make vaguely speech-like sounds.

pinkjimiphoton

i love the way you think, mark ;)

when i say cross over i realize i got it wrong lol.. i tune 'em to get the nastiest gargles i can from them usually. i still have a pcb you donated me years ago in my to-do list. it was working, i did something, ooops. one day...

but to go one further, instead of pinky control, how about we take the super low road for oddity, and do a soul kiss kinda design? instead of a pinky pot, use an ldr in a tube ya stick in your mouth. open the beak, filter goes squeak. then it would be an ELECTRONIC emulation of a talkbox using an actual facial protruberance, while not being a traditional talkbox at all. make that filter have a decent envelope follower and i postulate you may even be able to get a quasi vocordor-ish talking box effect. what better way to generate formants than thru use of the human vocal orifice and associated glands and uvulas and fillings and stuff? make funny phaces, and get funny noises.

maybe i DID do too much lds with spock at berkley...
  • SUPPORTER
"When the power of love overcomes the love of power the world will know peace."
Slava Ukraini!
"try whacking the bejesus outta it and see if it works again"....
~Jack Darr

11-90-an

I also like the way you think, Mark  :icon_mrgreen:

Thing is, this thread only started because I didn't realize that the talkbox sends the guitar signal into your mouth and your mouth acts like a cavity that dampens/attenuates the signal when you open and close your mouth... and the mic pics everything up... before I thought that the tube would pick up your voice and somehow mix it with the signal... :icon_lol:

But yes... that would be a really good idea... 1 controlled via expression pedal, 1 controlled by LFO, and one envelope controlled?  :icon_eek:
flip flop flip flop flip

Mark Hammer

And I like your thinking, too.  The idea of one modulated, one foot-controlled, and one envelope-controlled, makes sense in a conceptual way.

Imagine you're having a conversation, maybe even an argument, with someone over dinner.  Some formants will vary as you chew your food, some will remain constant and deliberate, and others will change as your emotion peaks here and there.

A number of years back, RG sent me a note with an idea for inclusion of white noise to mimic "breathiness".  In the current context, mimicry of human speech might also be enhanced by inclusion of pink or other noise in some deliberative way.  Perhaps a noise source could be/provide much of the content of the 3rd formant

pinkjimiphoton

......................like maybe shitty leaky noisy useless germanium transistors that we've ALL gotten at some point?
  • SUPPORTER
"When the power of love overcomes the love of power the world will know peace."
Slava Ukraini!
"try whacking the bejesus outta it and see if it works again"....
~Jack Darr

Mark Hammer

The received wisdom about transistor noise sources in the synth community is to install a socket and try them out.  I am personally not aware of any simple criterion or spec one can look for or measure to assess noise qualities that would be more accurate or reliable than just plugging the tranny in and listening.

CodeMonk

OK, I just had a totally ridiculous sounding thought.

Bear with me here....
Think of an upside down bowl that is sealed at the top.
Use a rubber or rubber like material that is flexible enough that you can easily distort it, but stiff enough to have memory.
Turn it upside down and cut a hole in it.

Basically, something to simulate the inside of the mouth that you can distort with your foot.

Might be interesting.

Also may help those who tend to drool when they use a talk box (Or am I the only one?).

Mark Hammer

Quote from: Mark Hammer on November 15, 2020, 05:23:58 PM
The received wisdom about transistor noise sources in the synth community is to install a socket and try them out.  I am personally not aware of any simple criterion or spec one can look for or measure to assess noise qualities that would be more accurate or reliable than just plugging the tranny in and listening.
Just to follow up on this, I'm just finishing up one of Kevin Mitchell's Mini Sample & Hold units ( https://www.diystompboxes.com/smfforum/index.php?topic=122371.msg1154322#msg1154322   what an absolutely marvellous circuit, Kevin!  :icon_biggrin: )  The schematic calls for a 2N3904 to use as the noise source.  I tried out a few I had but none of them grabbed me.  Serendipitously, the parts drawer that had the 3904s also had some BC307s, so I figured "what the hell", plugged one into the socket, and it was magic.  More variation and usable range.  I should also note, for the record, that I didn't have any BF245 JFETs, so I used J113s, and they worked like a charm.

Moral of the story: if a noise circuit relies on a transistor as the source, use a socket and try a bunch out.

aron

Mike Matthews also had the light activated wah. You put it in your mouth and by opening and closing your mouth, it would control the cutoff frequency of the wah. It was pretty cool for the time.

iainpunk

what about a vocoder,

you can put in the guitar in one side and a microphone or any XLR input on the other side and it tracks the formants of the voice and uses that as a filter over the input signal. IDK if it works with anything else then human voice, if not, you could plug in a bunch of other sounds like farts and chainsaws

cheers, Iain
friendly reminder: all holes are positive and have negative weight, despite not being there.

cheers

ElectricDruid

I wonder if the fact that the guitar sound goes *in* through the mouth, rather than coming out from the throat makes a difference?

I did some research into vocal synthesis at one point, and for accurate formant synthesis, you need four or five resonant peaks. Some are more important than others, and it depends on the vowel sound (A, E, I, O, U), so you can often get something "speech-like" with less. As people have mentioned, a couple of bandpass filters is the bare minimum. Maybe the mouth only has a couple of resonances, and the others come from the throat or the vocal cords themselves? Maybe the throat is relevant for a talkbox too, I don't know. You're feeding the sound in at the other end of the system though, for sure.

If you wanted something more sophisticated, you'd have to set up a filter bank with four or five filters, and you'd need to control the frequency, level, and Q of each filter. Then you program in the known vowel resonances, and get the system to interpolate from one to another as you twiddle some control or other. I was working on something like this, done digitally, but analog would be possible with maybe a processor to control it and do the interpolations.

Good data on the actual human resonances here:

https://homepages.wmich.edu/~hillenbr/Papers/HillenbrandGettyClarkWheeler.pdf

iainpunk

if you like sketchy speech synthesis, check out the pink trombone
https://dood.al/pinktrombone/
its quit a fun app to play with for like 5 minutes, then it just gets annoying

cheers, Iain
friendly reminder: all holes are positive and have negative weight, despite not being there.

cheers

Mark Hammer

Quote from: ElectricDruid on November 18, 2020, 08:34:30 AM
I wonder if the fact that the guitar sound goes *in* through the mouth, rather than coming out from the throat makes a difference?

I did some research into vocal synthesis at one point, and for accurate formant synthesis, you need four or five resonant peaks. Some are more important than others, and it depends on the vowel sound (A, E, I, O, U), so you can often get something "speech-like" with less. As people have mentioned, a couple of bandpass filters is the bare minimum. Maybe the mouth only has a couple of resonances, and the others come from the throat or the vocal cords themselves? Maybe the throat is relevant for a talkbox too, I don't know. You're feeding the sound in at the other end of the system though, for sure.

If you wanted something more sophisticated, you'd have to set up a filter bank with four or five filters, and you'd need to control the frequency, level, and Q of each filter. Then you program in the known vowel resonances, and get the system to interpolate from one to another as you twiddle some control or other. I was working on something like this, done digitally, but analog would be possible with maybe a processor to control it and do the interpolations.

Good data on the actual human resonances here:

https://homepages.wmich.edu/~hillenbr/Papers/HillenbrandGettyClarkWheeler.pdf
Thanks for that, Tom.  Useful reference material.  The only caveat I would offer is that it aims to identify the formants required for accurate identification/differentiation of various phoneme combinations.  My sense here is that our collective goal is not so much to achieve accuracy as much as some reasonable verisimilitude.  It's like the difference between a detailed-enough portrait that lets you know who is depicted in this one vs that, and asking what you need to include in a stick-man animation that looks kinda sorta like human movement.

I don't know if you've ever stood and watched the monkeys - virtually any lower primate - at the zoo, but one is struck by how "human" their movements and facial expressions can be, even though you know they would never be mistaken for humans.  I think we're collectively looking for something that, in passing, sounds like it could be speech, even though we know it isn't.