NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Speech Dictation Mode for Emacs (lepisma.xyz)
tbran 106 days ago [-]
To run text-to-speech on my laptop, I've been using Justine Tunney's downloadable single executable Whisper file.

I use it transcribe audio then copy into an LLM to get notes on whatever it is. Helps me decide to watch or listen to something and saves a bunch of time.

Her tweet: https://x.com/JustineTunney/status/1825551821857010143

Instructions from Simon Willison: https://simonwillison.net/2024/Aug/19/whisperfile/

Command line options: https://github.com/Mozilla-Ocho/llamafile/issues/544#issueco...

jwr 106 days ago [-]
Amazing work.

I am also impressed by the advances in technology. 20 years ago, I had severe RSI problems and worked on "vx-mode", a package for interfacing XEmacs to Dragon NaturallySpeaking, the best speech-recognition solution available at the time. My goals were similar, although the result was nowhere near what the OP has done. Also, speech recognition tech was nowhere near what we have now: I still remember buying good microphones, worrying about microphone placement relative to mouth, endless training and re-training…

This kind of software can make a huge difference for many people.

Jeff_Brown 106 days ago [-]
I'm really happy about it but I'm not sure how game changing it would be for a blind person. It seems to require seeing what's on the page.
jwr 105 days ago [-]
Perhaps not for a blind person, but for anyone with RSI or other hand/wrist impairments, this can make a huge difference. I speak from experience, having used dictation to work around RSI issues.
submeta 106 days ago [-]
Year 2080: AGIs help you trinscribe, structure, layout your code/text/thoughts. At the same time: HN posts: „New package for Emacs doing xyz“.
raverbashing 106 days ago [-]
And all it requires is some emacs version bump, some dependency upgrades, some external servers and changing the default shortcut in a confusing lisp file to something that doesn't require pressing 8 keys at the same time
kleiba 106 days ago [-]
Fun fact: even pressing three keys at the same time is rare when using Emacs (although there are some three-key combos I use regularly), most shortcuts consist of consecutive key presses.
fhd2 106 days ago [-]
I sometimes feel like playing the piano :D But the UX is better than you'd think, there's packages that show you what options you have for what key to press next, and the sequences are generally quite logical (e.g. CTRL-x followed by "p" has all the commands related to projects).

Plus you can always just enter the command instead of using the key stroke for it. Again, the default UX for that is a bit weak, but with a few packages it becomes pretty strong.

ashton314 106 days ago [-]
> there's packages that show you what options you have for what key to press next

Rejoice! The excellent which-key package that does this comes bundled with Emacs 30! (Emacs 30 will probably be released soon.)

> enter command… default UX is a bit weak

Agreed: the packages Helm, Ivy, and Vertico make this interface much nicer. I use Vertico [1] personally. Though, from Emacs 29, there are some really nice options you can set. I used the following in my Bedrock starter kit [2] to get nicer tab-completion: as soon as you hit TAB twice you'll get bumped into the Completion buffer to select something with your cursor.

Here's the relevant config:

    (setopt completion-auto-help 'always)                  ; Open completion always; `lazy' another option
    (setopt completions-max-height 20)                     ; This is arbitrary
    (setopt completions-detailed t)
    (setopt completions-format 'one-column)
    (setopt completions-group t)
    (setopt completion-auto-select 'second-tab)            ; Much more eager
    ;(setopt completion-auto-select t)                     ; See `C-h v completion-auto-select' for more possible values
There's more configuration options, of course, but this is helpful:

[1]: https://github.com/minad/vertico [2]: https://codeberg.org/ashton314/emacs-bedrock

spauldo 106 days ago [-]
which-key made it in? Sweet! I've been saying for years it should be in Emacs and turned on by default.
kleiba 106 days ago [-]
True. I often times find myself typing out the command rather than using some obscure key sequence like C-c C-v n (case in point: https://orgmode.org/manual/Key-bindings-and-Useful-Functions...). Since Emacs does tab completion for the command name too, I personally find that a better UX than using the "shortcut" (if I can remember it at all).
pxc 106 days ago [-]
I tend to use search for infrequently used stuff and stuff I'm just trying to learn for the first time, then if I find myself using it several times in a session I look up the keybind to start practicing that. If it sticks, it sticks, and if it doesn't... the search functionality is great!
eptcyka 105 days ago [-]
> the sequences are generally quite logical (e.g. CTRL-x followed by "p" has all the commands related to projects).

They really are not.

argiopetech 106 days ago [-]
Depends on if you count shift. I C-M-% (query-regexp-replace) fairly regularly, and that's 4.
kleiba 106 days ago [-]
Sure, shift counts. I suppose I would bind it to a more convenient keybinding if I used query-regexp-replace regularly, but note that I didn't say there weren't any such keybindings, just that they're rare.
b5n 106 days ago [-]
I assume this varies widely across setups.

    (use-package visual-regexp
      :defer t
      :bind (("C-c r" . vr/replace)
             ("C-c q" . vr/query-replace)
             ("C-r" . vr/isearch-backward)
             ("C-s" . vr/isearch-forward)))

    (use-package visual-regexp-steroids
      :defer t)
wiz21c 106 days ago [-]
year 2080: "M-x ai: imagine you are a smart emacs developper, write a configuration file that sets up LSP"

answer:

"I did it. Please note that you're using a Microsoft protocol. Microsoft has a long history of attacking the 4 core freedoms of the Free Software movement which are

The freedom to run the program as you wish, for any purpose (freedom 0). ..."

pxc 106 days ago [-]
This is kinda ideal tbh. I like how, for instance, F-Droid warns users about anti-features and integrations with proprietary web services. Clear messaging about problematic software + freedom to nonetheless choose those problematic options is great.

That said, I don't think this is the way the FSF evaluates software, or that they'd treat an open protocol like this. I could imagine a warning like this about integrating with a proprietary language server in particular, though— and I'd be grateful for it! A locally-run AI assistant that cared about things like that would be super cool.

marci 106 days ago [-]
anthk 106 days ago [-]
That AI would be running under GNU Hurd with Guix. Also, Scheme simplified itself so hard that it created something akin to the Common Lisp standard unitfying all ice's and srfi's into something manageable from humans in a single package.

Also it rewrote all of the legacy Emacs' Elisp into manageable Emacs Guile (with an uberfast JIT and/or libre Guile microcode from the FSF).

lepisma 106 days ago [-]
Hey, author here. Didn't notice this came up on HN.

I wrote a small follow up trying to write and speak at the same time here https://lepisma.xyz/journal/2024/09/13/can-i-output-two-stre...

pama 106 days ago [-]
Thats a cool idea. Could the LLM find the right location for the audio stream by simply having the context of the buffer, and the location of the text and audio cursor when the intersction starts?
lepisma 106 days ago [-]
I think it could work. In my example of writing docstring, I can see this working out with high probability.
voltaireodactyl 106 days ago [-]
This looks very useful, and beautifully presented — looking forward to being able to use with local model.
Jeff_Brown 106 days ago [-]
I would use this for edits that are hard to do otherwise. Like, instead of typing `M-x align-regexp` and then figuring out what regular expression to type, I would just highlight a passage and say to the LLM "Can you align all the library names in this import statement?"
BeetleB 106 days ago [-]
I did something similar here:

https://blog.nawaz.org/posts/2023/Dec/cleaning-up-speech-rec...

I now use Whisper with a much expanded prompt and have the flow integrated both in Emacs and my WM.

Prior HN discussion:

https://news.ycombinator.com/item?id=40174921

I've since done hours of transcription with it - often transcribing whole emails. The challenge is that my brain thinks very differently while talking compared to while typing. As a result, my output is very verbose, and is very different from what I would have typed. I haven't figured out how to speak as if I'm typing.

ggm 106 days ago [-]
"Emacs: Upgrade to MELPA"

ELPA installed s/w suite: "I'm sorry Dave, I can't do that"

anthk 106 days ago [-]
More like: Emacs: pull all the libre MELPA repos into a local .el file to be checked ondemand. Hide all the propietary depending or propietary repos.
ants_everywhere 106 days ago [-]
nerd-dictation is a decent offline speech dictation tool for Linux that I've used with Emacs https://github.com/ideasman42/nerd-dictation
namidark 106 days ago [-]
Has anyone gotten whisper.el/.cpp to work on OSX with the microphone permissions and Emacs?
zvmaz 105 days ago [-]
Does the author mind if he shared his Emacs configuration? So beautiful!
lvass 105 days ago [-]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 09:43:24 GMT+0000 (Coordinated Universal Time) with Vercel.