Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Kotlin creator's new language: a formal way to talk to LLMs instead of English (codespeak.dev)

103 points by souvlakee 2 hours ago | 78 comments

the_duke 1 hours ago [-]

This doesn't make too much sense to me.

* This isn't a language, it's some tooling to map specs to code and re-generate

* Models aren't deterministic - every time you would try to re-apply you'd likely get different output (without feeding the current code into the re-apply and let it just recommend changes)

* Models are evolving rapidly, this months flavour of Codex/Sonnet/etc would very likely generate different code from last months

* Text specifications are always under-specified, lossy and tend to gloss over a huge amount of details that the code has to make concrete - this is fine in a small example, but in a larger code base?

* Every non-trivial codebase would be made up of of hundreds of specs that interact and influence each other - very hard (and context - heavy) to read all specs that impact functionality and keep it coherent

I do think there are opportunities in this space, but what I'd like to see is:

* write text specifications

* model transforms text into a *formal* specification

* then the formal spec is translated into code which can be verified against the spec

2 and three could be merged into one if there were practical/popular languages that also support verification, in the vain of ADA/Spark.

But you can also get there by generating tests from the formal specification that validate the implementation.

onion2k 38 minutes ago [-]

Models aren't deterministic - every time you would try to re-apply you'd likely get different output (without feeding the current code into the re-apply and let it just recommend changes)

If the result is always provably correct it doesn't matter whether or not it's different at the code level. People interested in systems like this believe that the outcome of what the code does is infinity more important than the code itself.

dsr_ 13 minutes ago [-]

Let's rephrase:

Since nobody involved actually cares whether the code works or not, it doesn't matter whether it's a different wrong thing each time.

SpaceNoodled 37 minutes ago [-]

That's a huge "if."

gentooflux 23 minutes ago [-]

I usually invert those to reduce nesting

__loam 15 minutes ago [-]

The code is what the code does.

Copyrightest 29 minutes ago [-]

[dead]

jrm4 29 minutes ago [-]

I would be very comfortable with - re-run 100 times with different seeds. If the outcome is the same every time, you're reliably good to go.

DrJokepu 8 minutes ago [-]

> Models aren't deterministic

Is that really true? I haven’t tried to do my own inference since the first Llama models came out years ago, but I am pretty sure it was deterministic: if you fixed the seed and the input was the same, the output of the inference was always exactly the same.

bigwheels 5 minutes ago [-]

LLMs are not deterministic:

1.) There is typically a temperature setting (even when not exposed, all major providers stopped exposing it in the latest frontier models).

2.) Then, even with the temperature set to 0, it will be almost deterministic but you'll still observe small variations due to the limited precision of float numbers.

davedx 55 minutes ago [-]

My process has organically evolved towards something similar but less strictly defined:

- I bootstrap AGENTS.md with my basic way of working and occasionally one or two project specific pieces

- I then write a DESIGN.md. How detailed or well specified it is varies from project to project: the other day I wrote a very complete DESIGN.md for a time tracking, invoice management and accounting system I wanted for my freelance biz. Because it was quite complete, the agent almost one-shot the whole thing

- I often also write a TECHNICAL-SPEC.md of some kind. Again how detailed varies.

- Finally I link to those two from the AGENTS. I also usually put in AGENTS that the agent should maintain the docs and keep them in sync with newer decisions I make along the way.

This system works well for me, but it's still very ad hoc and definitely doesn't follow any kind of formally defined spec standard. And I don't think it should, really? IMO, technically strict specs should be in your automated tests not your design docs.

jbonatakis 9 minutes ago [-]

I have been building this in my free time and it might be relevant to you: https://github.com/jbonatakis/blackbird

I have the same basic workflow as you outlined, then I feed the docs into blackbird, which generates a structured plan with task and sub tasks. Then you can have it execute tasks in dependency order, with options to pause for review after each task or an automated review when all child task for a given parents are complete.

It’s definitely still got some rough edges but it has been working pretty well for me.

rebolek 9 minutes ago [-]

AGENTS.md is nice but I still need to remind models that it exists and they should read it and not reinvent the wheel every time.

the_duke 47 minutes ago [-]

I think many have adopted "spec driven development" in the way you describe.

I found it works very well in once-off scenarios, but the specs often drift from the implementation. Even if you let the model update the spec at the end, the next few work items will make parts of it obsolete.

Maybe that's exactly the goal that "codespeak" is trying to solve, but I'm skeptical this will work well without more formal specifications in the mix.

fnord77 12 minutes ago [-]

exactly - a formal language is defined by strict first order logic.

A language model like LLMs is designed to be fuzzy and imprecise (using probability distributions) because that's how real language is.

I think the creator of this language doesn't get language models.

pessimizer 41 minutes ago [-]

I think your objections miss the point. My informal specs to a program are user-focused. I want to dictate what benefits the program will give to the person who is using it, which may include requirements for a transport layer, a philosophy of user interaction, or any number of things. When I know what I want out of a program, I go through the agony of translating that into a spec with database schemas, menu options, specific encryption schemes, etc., then finally I turn that into a formal spec within which whether I use an underscore or a dash somewhere becomes a thing that has to be consistent throughout the document.

You're telling me that I should be doing the agonizing parts in order for the LLM to do the routine part (transforming a description of a program into a formal description of a program.) Your list of things that "make no sense" are exactly the things that I want the LLMs to do. I want to be able to run the same spec again and see the LLM add a feature that I never expected (and wasn't in the last version run from the same spec) or modify tactics to accomplish user goals based on changes in technology or availability of new standards/vendors.

I want to see specs that move away from describing the specific functionality of programs altogether, and more into describing a usefulness or the convenience of a program that doesn't exist. I want to be able to feed the LLM requirements of what I want a program to be able to accomplish, and let the LLM research and implement the how. I only want to have to describe constraints i.e. it must enable me to be able to do A, B, and C, it must prevent X,Y, and Z; I want it to feel free to solve those constraints in the way it sees fit; and when I find myself unsatisfied with the output, I'll deliver it more constraints and ask it to regenerate.

darkwater 31 minutes ago [-]

> I want to be able to run the same spec again and see the LLM add a feature that I never expected (and wasn't in the last version run from the same spec) or modify tactics to accomplish user goals based on changes in technology or availability of new standards/vendors.

Be careful what you wish for. This sounds great in theory but in practice it will probably mean a migration path for the users (UX changes, small details changed, cost dynamics and a large etc.)

hkonte 39 minutes ago [-]

[dead]

lich_king 2 hours ago [-]

We built LLMs so that you can express your ideas in English and no longer need to code.

Also, English is really too verbose and imprecise for coding, so we developed a programming language you can use instead.

Now, this gives me a business idea: are you tired of using CodeSpeak? Just explain your idea to our product in English and we'll generate CodeSpeak for you.

Sharlin 57 minutes ago [-]

I'm sure that this time the language will be simple and English-like enough that execs can use it directly, similarly to COBOL and SQL.

kevin_thibedeau 48 minutes ago [-]

The idea is this would be a kind of IL for natural language queries. Then the main LLM isn't dependent on quirks of English.

souvlakee 1 hours ago [-]

No joke. I'm 100% sure that if it's successful, we will find CC's skill to write specs for CodeSpeak.

lucasoshiro 12 minutes ago [-]

Yeah. It's hard to express and understand nested structures in a natural language yet they are easy in high-level programming languages. E.g. "the dog of first son of my neighbour" vs "me.neighbour.sons[0].dog", "sunny and hot, or rainy but not cold" vs "(sunny && hot) || (rainy && !cold)".

In the past maths were expressed using natural language, the math language exists because natural language isn't clear enough.

lich_king 9 minutes ago [-]

Did you mean AbstractNeighborDispatcherFactory?

ramon156 9 minutes ago [-]

I'm really glad random HN commenters know it better than someone that built a language that has been used in thousands of products.

awkwardpotato 3 minutes ago [-]

Standard appeal to accomplishment, past success does not guarantee future success... especially on this joke comment

theK 1 hours ago [-]

Damn, I am the product A-GAIN?

amelius 1 hours ago [-]

COBOL?

cratermoon 15 minutes ago [-]

relevant Dijkstra https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...

"In order to make machines significantly easier to use, it has been proposed (to try) to design machines that we could instruct in our native tongues. this would, admittedly, make the machines much more complicated, but, it was argued, by letting the machine carry a larger share of the burden, life would become easier for us. It sounds sensible provided you blame the obligation to use a formal symbolism as the source of your difficulties. But is the argument valid? I doubt."

mosburger 56 minutes ago [-]

sssssh! if this catches on we can keep our jobs! (j/k, mostly)

herrington_d 1 minutes ago [-]

Isn't the case study.... too contrived and trivial? The largest code change is 800 lines so it can readily fit in a model's context.

However, there is no case for more complicated, multi-file changes or architecture stuff.

leksak 55 seconds ago [-]

I think I prefer Tracey https://github.com/bearcove/tracey

lifis 51 minutes ago [-]

As far as I can tell it's not a new language, but rather an alternative workflow for LLM-based development along with a tool that implements it.

The idea, IIUC, seems to be that instead of directly telling an LLM agent how to change the code, you keep markdown "spec" files describing what the code does and then the "codespeak" tool runs a diff on the spec files and tells the agent to make those changes; then you check the code and commit both updated specs and code.

It has the advantage that the prompts are all saved along with the source rather than lost, and in a format that lets you also look at the whole current specification.

The limitation seems to be that you can't modify the code yourself if you want the spec to reflect it (and also can't do LLM-driven changes that refer to the actual code), and also that in general it's not guaranteed that the spec actually reflects all important things about the program, so the code does also potentially contain "source" information (for example, maybe your want the background of a GUI to be white and it is so because the LLM happened to choose that, but it's not written in the spec).

The latter can maybe be mitigated by doing multiple generations and checking them all, but that multiplies LLM and verification costs.

Also it seems that the tool severely limits the configurability of the agentic generation process, although that's just a limitation of the specific tool.

abreslav 29 minutes ago [-]

> The limitation seems to be that you can't modify the code yourself if you want the spec to reflect it

Eventually, we'll end up in a world where humans don't need to touch code, but we are not there yet. We are looking into ways to "catch up" the specs with whatever changes happen in the code not through CodeSpeak (agents or manual changes or whatever). It's an interesting exercise. In the case of agents, it's very helpful to look at the prompts users gave them (we are experimenting with inspecting the sessions from ~/.claude).

More generally, `codespeak takeover` [1] is a tool to convert code into specs, and we are teaching it to take prompts from agent sessions into account. Seems very helpful, actually.

I think it's a valid use case to start something in vibe coding mode and then switch to CodeSpeak if you want long-term maintainability. From "sprint mode" to "marathon mode", so to speak

[1] https://codespeak.dev/blog/codespeak-takeover-20260223

newsoftheday 9 minutes ago [-]

> Eventually, we'll end up in a world where humans don't need to touch code, but we are not there yet.

Will we though? Wouldn't AI need to reach a stage where it is a tool, like a compiler, which is 100% deterministic?

lifis 18 minutes ago [-]

Also they seem to want to run this as a business, which seems absurd to me since I don't see how they can possibly charge money, and anyway the idea is so simple that it can be reimplemented in less than a week (less than a day for a basic version) and those alternative implementations may turn out to be better.

It also seems to be closed-source, which means that unless they open the source very soon it will very likely be immediately replaced in popularity by an open source version if it turns out to gain traction.

souvlakee 44 minutes ago [-]

As far as I can tell C is not a new language, but rather an alternative workflow for assembly development along with a tool that implements it.

abreslav 36 minutes ago [-]

I second that :)

abreslav 29 minutes ago [-]

> Also it seems that the tool severely limits the configurability of the agentic generation process, although that's just a limitation of the specific tool.

Working on that as well. We need to be a lot more flexible and configurable

47 minutes ago [-]

montjoy 9 minutes ago [-]

So, instead of making LLMs smarter let’s make everything abstract again? Because everyone wants to learn another tool? Or is this supposed to be something I tell Claude, “Hey make some code to make some code!” I’m struggling to see the benefit of this vs. just telling Claude to save its plan for re-use.

kleiba 1 hours ago [-]

I cannot read light on black. I don't know, maybe it's a condition, or simply just part of getting old. But my eyes physically hurt, and when I look up from reading a light-on-black screen, even when I looked at only for a short moment, my eyes need seconds to adjust again.

I know dark mode is really popular with the youngens but I regularly have to reach for reader mode for dark web pages, or else I simply cannot stand reading the contents.

Unfortunately, this site does not have an obvious way of reading it black-on-white, short of looking at the HTML source (CTRL+U), which - in fact - I sometimes do.

newsoftheday 4 minutes ago [-]

Same for me, has been my whole life. I complain about it all the time. It's well documented that people can read black on light far better and with less eye strain than light on black; yet there seems to be a whole generation of developers determined to force us all to try and read it. Even the media sites like Netflix, Prime, etc. force it. At least Tubi's is somewhat more readable.

Sometimes a site will include a button or other UI element to choose a light theme but I find it odd that so many sites which are presumed to be designed by technically competent people, completely ignore accessibility concerns.

embedding-shape 56 minutes ago [-]

Do you sit in a bright room? Right now, during the night, I see your comment like this: https://i.imgur.com/c7fmBns.png, but during the day when the room is bright, I also see everything with light themes/background colors, otherwise it is indeed hard to see properly.

kleiba 32 minutes ago [-]

Unfortunately, in my case, it's not a matter of lighting conditions.

alexc05 49 minutes ago [-]

this is really exciting and dovetails really closely with the project I'm working on.

I'm writing a language spec for an LLM runner that has the ability to chain prompts and hooks into workflows.

https://github.com/AlexChesser/ail

I'm writing the tool as proof of the spec. Still very much a pre-alpha phase, but I do have a working POC in that I can specify a series of prompts in my YAML language and execute the chain of commands in a local agent.

One of the "key steps" that I plan on designing is specifically an invocation interceptor. My underlying theory is that we would take whatever random series of prose that our human minds come up with and pass it through a prompt refinement engine:

> Clean up the following prompt in order to convert the user's intent > into a structured prompt optimized for working with an LLM > Be sure to follow appropriate modern standards based on current > prompt engineering reasech. For example, limit the use of persona > assignment in order to reduce hallucinations. > If the user is asking for multiple actions, break the prompt > into appropriate steps (**etc...)

That interceptor would then forward the well structured intent-parsed prompt to the LLM. I could really see a step where we say "take the crap I just said and turn it into CodeSpeak"

What a fantastic tool. I'll definitely do a deep dive into this.

WillAdams 15 minutes ago [-]

This raises a question --- how well do LLMs understand Loglan?

https://www.loglan.org/

Or Lojban?

https://mw.lojban.org/

ppqqrr 16 minutes ago [-]

i’ve been doing this for a while, you create an extra file for every code file, sketch the code as you currently understand it (mostly function signatures and comments to fill in details), ask the LLM to help identify discrepancies. i call it “overcoding”.

i guess you can build a cli toolchain for it, but as a technique it’s a bit early to crystallize into a product imo, i fully expect overcoding to be a standard technique in a few years, it’s the only way i’ve been able to keep up with AI-coded files longer than 1500 lines

tonipotato 1 hours ago [-]

The problem with formal prompting languages is they assume the bottleneck is ambiguity in the prompt. In my experience building agents, the bottleneck is actually the model's context understanding. Same precise prompt, wildly different results depending on what else is in the context window. Formalizing the prompt doesn't help if the model builds the wrong internal representation of your codebase. That said curious to see where this goes.

slfnflctd 43 minutes ago [-]

Two pieces of advice I keep seeing over & over in these discussions-- 1) start with a fresh/baseline context regularly, and 2) give agents unix-like tools and files which can be interacted with via simple pseudo-English commands such as bash, where they can invoke e.g. "--help" to learn how to use them.

I'm not sure adding a more formal language interface makes sense, as these models are optimized for conversational fluency. It makes more sense to me for them to be given instructions for using more formal interfaces as needed.

CodeCompost 11 minutes ago [-]

Yes I'm also one of those LLM skeptics but actually this looks interesting.

h4ch1 1 hours ago [-]

You can basically condense this entire "language" into a set of markdown rules and use it as a skill in your planning pipeline.

And whatever codespeak offers is like a weird VCS wrapper around this. I can already version and diff my skills, plans properly and following that my LLM generated features should be scoped properly and be worked on in their own branches. This imo will just give rise to a reason for people to make huge 8k-10k line changes in a commit.

mft_ 1 hours ago [-]

Conceptually, this seems a good direction.

The other piece that has always struck me as a huge inefficiency with current usage of LLMs is the hoops they have to jump through to make sense of existing file formats - especially making sense of (or writing) complicated semi-proprietary formats like PDF, DOC(X), PPT(X), etc.

Long-term prediction: for text, we'll move away from these formats and towards alternatives that are designed to be optimal for LLMs to interact with. (This could look like variants of markdown or JSON, but could also be Base64 [0] or something we've not even imagined yet.)

[0] https://dnhkng.github.io/posts/rys/

pessimizer 58 minutes ago [-]

If LLMs can't deal with those legacy file formats, I don't trust them to be able to deal with anything. The idea that LLMs are so sophisticated that we have a need to dumb down inputs in order to interact with them is self-contradictory.

layer8 42 minutes ago [-]

While I agree, the parent also talks about efficiency. If a different format increases efficiency, that could be reason enough to switch to it, even if understanding doesn’t improve and already was good before.

le-mark 52 minutes ago [-]

This concept is assuming a formalized language would make things easier somehow for an llm. That’s making some big assumptions about the neuro anatomy if llms. This [1] from the other day suggests surprising things about how llms are internally structured; specifically that encoding and decoding are distinct phases with other stuff in between. Suggesting language once trained isn’t that important.

[1] https://news.ycombinator.com/item?id=47322887

abreslav 25 minutes ago [-]

We are not trying to make things easier for LLMs. LLMs will be fine. CodeSpeak is built for humans, because we benefit from some structure, knowing how to express what we want, etc.

xvedejas 1 hours ago [-]

We already have a language for talking to LLMs: Polish

https://www.zmescience.com/science/news-science/polish-effec...

roxolotl 2 hours ago [-]

This doesn't seem particularly formal. I still remain unconvinced reducing is really going to be valuable. Code obviously is as formal as it gets but as you trend away from that you quickly introduce problems that arise from lack of formality. I could see a world in which we're all just writing tests in the form of something like Gherkin though.

tasuki 43 minutes ago [-]

> I could see a world in which we're all just writing tests in the form of something like Gherkin though.

Yes, and the implementation... no one actually cares about that. This would be a good outcome in my view. What I see is people letting LLMs "fill in the tests", whereas I'd rather tests be the only thing humans write.

xhkkffbf 1 hours ago [-]

While I'm also a bit skeptical, I think some formalism could really simplify everything. The programming world has lots of words that mean close to the same thing (subroutine, method, function, etc. ). Why not choose one and stick to it for interactions with the LLM? It should save plenty of complexity.

yellow_lead 21 minutes ago [-]

So, just a markdown file?

gritzko 2 hours ago [-]

So is it basically Markdown? The landing does not articulate, unfortunately, what the key contribution is.

matthewkayin 1 hours ago [-]

I tried looking through some of the spec samples, and it was not clear what the "language" was or that there was any syntax. It just looks like a terse spec.

oceanwaves 37 minutes ago [-]

In my building and research of Simplex, specs designed for LLM consumption don't need a formalized syntax as much as they just need an enforced structure, ideally paired with a linter. An effective spec for LLMs will bridge the gap between natural language and a formal language. It's about reducing ambiguity of intent because of the weaknesses and inconsistencies of natural language and the human operator.

Cpoll 53 minutes ago [-]

> The spec is the source of truth

This feels wrong, as the spec doesn't consistently generate the same output.

But upon reflection, "source of truth" already refers to knowledge and intent, not machine code.

amelius 1 hours ago [-]

I want to see an LLM combined with correctness preserving transforms.

So for example, if you refactor a program, make the LLM do anything but keep the logic of the program intact.

ljlolel 2 hours ago [-]

Getting so close to the idea. We will only have Englishscripts and don’t need code anymore. No compiling. No vibe coding. No coding. Https://jperla.com/blog/claude-electron-not-claudevm

pure-orange 1 hours ago [-]

this will have to compile to something tho? So there will always be code

cesarvarela 2 hours ago [-]

Instead of using tabs, it would be much better to show the comparison side by side.

Also, the examples feel forced, as if you use external libraries, you don't have to write your own "Decode RFC 2047"

oytis 59 minutes ago [-]

Then of course we are going to ask LLMs to generate specifications in this new language

1 hours ago [-]

Brajeshwar 59 minutes ago [-]

So, back to a programming language, albeit “simplified.”

cestith 26 minutes ago [-]

Is this more like a programming language, or more like a specification system akin to UML?

55 minutes ago [-]

fallkp 60 minutes ago [-]

"Coming soon: Turning Code into Specs"

There you have it: Code laundering as a service. I guess we have to avoid Kotlin, too.

oceanwaves 57 minutes ago [-]

https://thinkwright.ai/simplex

jajuuka 1 hours ago [-]

We created programming languages to direct programs. Then created LLM's to use English to direct programs. Now we've create programming languages to direct LLM's. What is old is new again!

tamimio 38 minutes ago [-]

As someone who hates writing (and thus coding) this might be a good tool, but how’s is it different from doing the same in claude? And I only see python, what about other languages, are they also production grade?

pjmlp 2 hours ago [-]

I think stuff like Langflow and n8n are more likely to be adopted, alongside with some more formal specifications.

kittikitti 59 minutes ago [-]

The intent of the idea is there, and I agree that there should be more precise syntax instead of colloquial English. However, it's difficult to take CodeSpeak seriously as it looks AI generated and misses key background knowledge.

I'm hoping for a framework that expands upon Behavior Driven Development (BDD) or a similar project-management concept. Here's a promising example that is ripe for an Agentic AI implementation, https://behave.readthedocs.io/en/stable/philosophy/#the-gher...

whalesalad 1 hours ago [-]

https://en.wikipedia.org/wiki/Literate_programming

theoriginaldave 1 hours ago [-]

I for one can't wait to be a confident CodeSpeak programmer /sarc

Does this make it a 6th generation language?

Rendered at 16:29:32 GMT+0000 (Coordinated Universal Time) with Vercel.