The Ephemeral Nature of Voice

November 19, 2012

Martin Geddes thinks voice need not be ephemeral. I beg to differ.

In a recent keynote speech, Martin Geddes decided to coin a new term: Hypervoice; and then give it some substance. While I haven’t attended that event, Martin was kind enough to share the presentation with the world on Slideshare.

My main problem is with slide 62:

Just as we now routinely digitally capture our words and images, we will capture our voices. Voices need no longer be ephemeral.

Capturing voice (and video for that matter) is quite different than text and images. It doesn’t behave the same. It doesn’t work the same.

We capture images today. We do so when it fits the need, when we have a camera with us, with an Instagram application so we can edit the image captured before uploading it to give it an ambiance that wasn’t there to begin with – editing it. We do the editing part “offline” and then we publish it.

Writing this post took a lot of editing (just this sentence was written and rewritten more than once). Text is a process of putting thoughts into words. And then updating it – deleting, adding, trimming, emphasizing – things that are natural to do to text.

Instant messaging – chatting with someone online is done in an ephemeral type of an interaction. Yes – most systems today do store the chat history, but No – this doesn’t make it any less ephemeral. The interaction is stored, but it isn’t an edited experience. There’s no real URL for that interaction – no hypertext to compare to.

And voice? We interact and communicate with people through voice. It is the natural way of doing things. We do it face to face, where we use body language as well. We do it over a video chat session – talking-heads-style, and we do it over the phone with only our voice.

We could have recorded it all, then translated it automatically into text, archive it all in the cloud and make it searchable. But there’s no editing here either – what you said was simply recorded and then served later when needed the next time.

There’s a lot more to do with voice – that’s for sure. And archiving it intelligently is probably the first step we should think of. Translation should be there as well. Gleaning insights out of it would be nice. But at the end of the day – the basic experience is ephemeral. We chat in the moment – the uncut version – and then we store it.s

You may also like

WebRTC predictions for 2023

WebRTC predictions for 2023

Your email address will not be published. Required fields are marked

  1. Hello Tsahi,

    While reading your post, I could not resist to share a “scratch my head” thought.

    Basically, think of a Siri or similar tool that can “capture” our voice, with a resulting “in real life consequence”. You know, like setting up an appointment, a reminder to call someone tomorrow, sending a SMS / an email / a tweet (ok, not so real life, but still), or even initiating a purchase for instance (movie, books)?

    What can we think about these “in real life consequences”? Would one qualify these as ephemeral or persistent? Could I refer to them later via an hyperlink (web agenda, emails, tweets, bank or merchant account)? Is it capturing voice?

    I’m tempted to say that while the initial “vocal trigger” experience is indeed ephemeral, the produced output can be seen persistent. At least: it survives its trigger in both time and scope. And I could have valid hyperlinks too.

    I feel that traditional Telcos would say this is over generalizing, and it might very well be. But some others might agree with this persistent voice experience.

    Would you?

    1. This is a really good question.

      I think it is no different than the current use case of calling an insurance agent, who is recording the call and asks you questions – stating the fact that this is used as a kind of guarantee on the price quote you’ll be getting on your health insurance policy.

      If this gets a URL in the end, or just gets archived in some backend system, what does it mean about the ephemeral nature of voice? It is just record/playback. Nothing like the editing done on text pages and hyperlinks. And if you can reach it via URL – is it hypervoice or hyperlink? I’d say it’s hyperlink.

{"email":"Email address invalid","url":"Website address invalid","required":"Required field missing"}