NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Steering interpretable language models with concept algebra (guidelabs.ai)
giang_at_glai 14 hours ago [-]
Author here.

This post shows “concept algebra” on language model: inject, suppress, and compose human-understandable concepts at inference time (no retraining, no prompt engineering).

There’s an interactive demo on the post.

Would love feedback on: (1) what steering tasks you’d benchmark, (2) failure cases you’d want to see, (3) whether this kind of compositional control is useful in real products.

Related: https://news.ycombinator.com/item?id=47131225

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 20:07:41 GMT+0000 (Coordinated Universal Time) with Vercel.