Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Can text be made to sound more than just its words? (2022) (arxiv.org)

47 points by tobr 97 days ago | 30 comments

andai 84 days ago [-]

Many moons ago I became quite obsessed with analyzing spectrograms on my computer.

I would load up audio files in Audacity and look at them to see how the audio "looked", as a function of how intense each frequency is over time.

You can even set a track to spectrogram while recording which allowed you to see the sound in real time.

Music also tends to be very beautiful in the spectrogram! And birdsong also. Sometimes I would see a bird first, and only afterwards notice it in my field of hearing.

I noticed while analyzing a podcast that I began to recognize common words like "you." I also noticed that I was able to easily distinguish between different people's voices.

I had to wonder if I were deaf, or if I become deaf, I would suddenly have a strong motivation to learn how to read these things. To develop some kind of device which would show them to me 24 hours a day.

I have not done this, but the project has remained in the back of my mind for over a decade.

Does anyone else know more about this? Does such a device exist?

I think that only some linguists learn how to read spectrograms. But it seems like something that might be extremely useful to any hearing impaired person?

Relating to the article, I think one could quickly learn to read them fluently (e.g. as subtitles, perhaps overlaid on real life), and of course you get the tonal information built in for free—that's what a spectrogram is!

AndrewOMartin 84 days ago [-]

You're on the fringe of an area which in academia is called Sensory Substitution. A simplification of which is experiencing one of the five senses using different sense organs than usual. Classic examples of this are video cameras which represent their image as a matrix of vibrations on the subjects skin or as a sound.

kiicia 84 days ago [-]

There was a guy who was able to recognize music just by looking at grooves of vinyl recording https://en.wikipedia.org/wiki/Arthur_Lintgen

m463 84 days ago [-]

I remember being able to recognize one song on vinyl.

It was a (telarc I think?) recording of the 1812 overture.

The grooves were wide where the canons went off, so that the needle could deflect enough to capture the dynamic range. You could see the waveform.

I think of "Surely You're Joking Mr. Feynman" where people could sniff like a bloodhound. Feynman would have people handle books, and he could tell which ones had been handled.

I think there are things that just trying would be successful more than you think.

wincy 84 days ago [-]

I knew a blind guy who did a trial where he could “see” using his tongue. Pretty neat!

https://news.wisc.edu/a-taste-of-vision-device-translates-fr...

shomp 84 days ago [-]

The book Understanding Comics by Scott McCloud is a tremendous study in this area, Scott shows how you can add abstract meanings to words and pictures through illustration.

foofoo12 84 days ago [-]

Very interesting idea. I remember reading that in visual spoken communications, only 20% is the actual words. The rest is tone of voice, body language, context, emphasis, expressions, ... all that stuff.

I don't know if 20% is correct, but I feel it's very close to it. I also think a lot of internet arguments happen as a direct result of miscommunication. Emojis are great, but they get abused to the point that HN filters them out. Perhaps allow readers to toggle if they want to see emojis or not?

Isognoviastoma 84 days ago [-]

Easy to check: try to speak with someone talking foreign language you don't know and estimate what percentage of what they said you understood from tone of voice etc. I would guess it's less than 80%.

foofoo12 84 days ago [-]

That's very easy and very wrong. Let's say you have a 100 page book. Page 1 contains fundamental knowledge that allows you to understand the rest of it. If you skip page 1 then you won't understand the other 99.

How much of the book will you understand if you only read page 1?

kalavan 84 days ago [-]

That then raises the question: what is a unit of communication?

If communication is 20% verbal and 80% nonverbal, and if communication is very nonlinear in understanding (as with your book example), how do we know what 1% of communication is? What does it mean, and how can we tell that the figure is correct, when our main or only way of detecting whether communication succeeded is through understanding or lack thereof?

foofoo12 84 days ago [-]

> when our main or only way of detecting whether communication succeeded is through understanding or lack thereof

That's not even a good test, due to miscommunication. Both parties might think it succeeded, but then much later on you find out the truth (maybe).

ethmarks 84 days ago [-]

But tonal information can be parsed without lexical understanding and vice versa.

Somebody cursing in French can still be interpreted as anger even if you don't understand French, and written profanity can still be interpreted as anger even if you didn't hear it spoken.

Tone and language do complent each other, but neither is a prerequisite for the other like your book analogy would suggest.

foofoo12 84 days ago [-]

> but tonal information can be parsed without lexical understanding

Parsed perhaps, but it's so context sensitive that it's not useful, save for extremities. The same tone of voice can have so many meanings based on what's actually being said and yet another if you add context.

cenamus 84 days ago [-]

Maybe also control for cultural similarity, but I definitely agree

eszed 84 days ago [-]

There's an acting exercise (it's from Joan Littlewood via Clive Barker) where one speaks "gibberish" - making language sounds, but not words - which, almost automatically, once they drop their terror of doing it, opens students up to all of those other avenues of communication. Later, you can switch students back and forth between the script and gibberish, and it becomes plain that if you can't play a scene as clearly (to those in it, not considering the audience) in gibberish as you can with words then you don't fully understand it.

84 days ago [-]

failrate 84 days ago [-]

Comic books already use changes in font, weight, size, of text and the shape of the word balloon to indicate tone and expression.

OisinMoran 84 days ago [-]

Something like this would be great for karaoke! Especially for the long held notes https://x.com/TheOisinMoran/status/1614435041764859907

pimlottc 84 days ago [-]

Another thing to look at would be how games like Rock Band and Guitar Hero show lyrics

mati365 84 days ago [-]

Consider learning Polish. Kurwa sounds exactly as it looks.

58937928709622 84 days ago [-]

może morze rzeka rzeka

beepbooptheory 84 days ago [-]

Reminds me of how the captions were done in Tony Scott's Man on Fire (2004). It's a pretty great movie too.

voxleone 84 days ago [-]

Emojis absolutely have their place here. They can add tone, nuance, and a bit of humanity where plain text can feel flat.

embedding-shape 84 days ago [-]

I feel like emojis is the lazy persons way of adding tone, nuance and humanity, when you don't know how to do so by only writing. Don't want to imply it's wrong, it's valid to be lazy, especially when it comes to improving communication, but I find myself thinking "How can I make sure this comes across as the joke it is?" and after one or two minute I just end up slapping a wink emoji at the end and don't rewrite the text at all, as the lazy person I am.

jonplackett 84 days ago [-]

When you only want to write w a single word back though + and emoji, there’s not a lot of space to add tone!

pnut 84 days ago [-]

An idea compressed down into a single character is elegant and efficient.

ZoomZoomZoom 83 days ago [-]

Single grapheme ;)

realty_geek 84 days ago [-]

I've always wondered about this.

In Akan languages it is not difficult to conceive of how the same word can be written in different ways to convey another dimension.

Anyone who speaks an akan language will understand that each of these words below means good but with a slightly different emphasis.

papa papaaapa papapapapapa

What is the linguistic term for this concept?

pegasus 84 days ago [-]

Apparently, it's called partial reduplication or emphatic doubling.

realty_geek 84 days ago [-]

Thanks, that is helpful.

Chatgpt also explained the concept of ideophones which was helpful:

https://chatgpt.com/share/69187b3e-7948-8001-9fea-2b4412d5a7...

egberts1 83 days ago [-]

Now you are delving into the world of intonation, just like ASL can squeeze nearly 200 meanings out of a single sign or Navaho can utter a consonant too in hundreds of ways that befuddle even the best enemy codebreakers.

Spoke English is also the same.

Just watch a typical George Carlin video on how he stretches out a single word.

Rendered at 16:50:44 GMT+0000 (Coordinated Universal Time) with Vercel.