Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Asking LLMs to create my game Shepard's Dog (github.com)

155 points by vnglst 336 days ago | 61 comments

vnglst 336 days ago [-]

Shepherd's Dog is a game I've wanted to create for a long time, but I never got the sheep flocking behaviour just right. The goal of the game is to herd all the sheep into the pen before nightfall. I've asked several models to create this game and I'm particularly impressed with what Claude 3.7 could do with a one-shot prompt.

- You can play the Claude game here (note: doesn't work on Safari for some reason): https://html-preview.github.io/?url=https://raw.githubuserco...

- o3-mini's version is here: https://html-preview.github.io/?url=https://raw.githubuserco...

Results of other models and a leaderboard is here: https://github.com/vnglst/when-ai-fails/blob/main/shepards-d...

Some videos: https://hachyderm.io/@vnglst/114125938185826311

Keyframe 333 days ago [-]

Shepherd's Dog is a game I've wanted to create for a long time

Not sure if you're aware, but there was a game like that for playstation and GBA, called Sheep! https://en.wikipedia.org/wiki/Sheep_(video_game) Here's some gameplay footage (player here didn't chose a dog to play with for some reason): https://www.youtube.com/watch?v=SP058CHQj20 Premise of the game is the same, you run the sheep to the designated area over obstacles.

vnglst 333 days ago [-]

Ah thanks for this. The game above is lovely and it’s really similar to what I had in mind (I was also thinking of lemmings!). I see in the other comments below that this idea of mine has been created as a game a lot of times already. Seems like I’m not as original as I thought haha

Keyframe 333 days ago [-]

Seems like I’m not as original as I thought haha

in creative work that's absolutely irrelevant. Don't even think about that. Everything has been done before; It's your take that counts, your vision!

AustinDev 333 days ago [-]

Just tried a 1-shot on Grok3 - Thinking and it couldn't get past the start button. Throws an error: | "<a class='gotoLine' href='#67:39'>67:39</a> Uncaught ReferenceError: startGame is not defined"

Scope issue.

No barking or dog player model but pretty similar in style to Claude's output.

What's interesting to me about playing with AI Codegen is each model has specific and sometimes overlapping output errors. Claude 3.7 really like to solve errors by returning dummy data as a 'fallback' when doing client or server calls. A little prompting can reduce this but not eliminate it. 'The tests always pass if you return dummy data'

https://jsfiddle.net/aL3ugtj1/

jchw 333 days ago [-]

Here is an attempt using Google Gemini 2.0 Pro Experimental.

https://gist.github.com/jchv/e8869a7cbe2d854a0ec93e946030d90...

It seems like it has some issues, but the result is interesting nonetheless. Just a one-shot like the others, needed a single "Keep going" but otherwise this is the vanilla output from the prompt.

Edit: Looks like you can share an HTML preview of a gist using html-preview.github.io, so here's that. https://html-preview.github.io/?url=https://gist.githubuserc... - It'll go to level 2 if you refresh the page and hit Restart, but I don't think it's possible to clear Level 2. The flock stays too far apart to fit enough sheep in the pen.

n4r9 333 days ago [-]

I just played the Claude attempt and found that the "fence" in level 3 doesn't actually obstruct either the dog or the sheep. Otherwise pretty fun.

swyx 333 days ago [-]

great demos. one shotting isnt really fair imo, i feel like that might be hard even for a human to do (working without feedback). i'd be curious what deepseek would do with a bit more feedback.

breckenedge 333 days ago [-]

Since you’re releasing the code to GitHub, do you think you’ll eventually run into issues with the training data including prior versions of the game?

tdy_err 333 days ago [-]

The implied scenario being that the memory of its own output would result in the model producing degraded future output? Why is that a given?

mythrwy 333 days ago [-]

Probably the same reason that close relatives marrying each other for generations produces genetic problems.

Etherlord87 333 days ago [-]

Not the same reason at all. In genetics the reason is that you're losing gene variety and eventually recessive genes aren't suppressed anymore. In case of LLM it's just error accumulation.

mythrwy 325 days ago [-]

It's a few days late but "losing gene variety" isn't the cause. What happens is genetic errors compound and are more likely to be expressed. I.E. "error accumulation".

Etherlord87 321 days ago [-]

You're wrong. You clearly have the Internet, I don't understand why won't you just google it and learn about it instead of claiming stuff that is bs.

mythrwy 313 days ago [-]

How about a number of grad level genetics courses? Does that beat your google search? Because that is what I have. And what I am telling you is what happens.

This is really easily searched (as you said).

You might read up on it if interested. Check out why inbreeding can lead to expression of genetic defects. What is the mechanism? (hint: it's not "losing gene diversity" or "suppression").

Etherlord87 312 days ago [-]

Very bad courses then.

https://biology.stackexchange.com/questions/58769/what-are-t...

mythrwy 312 days ago [-]

Without getting into the validity of the source, let's look at what it says:

Here is the first sentence from the top answer:

`You are right. Inbreeding strongly increases overall homozygosity which subjects inbred individuals to diseases caused by rare recessive alleles.`.

Let's see what homozygosity means shall we?

https://www.genome.gov/genetics-glossary/homozygous

`Homozygous, as related to genetics, refers to having inherited the same versions (alleles) of a genomic marker from each biological parent. Thus, an individual who is homozygous for a genomic marker has two identical versions of that marker. By contrast, an individual who is heterozygous for a marker has two different versions of that marker.`

In other words, errors can accumulate and are more likely to be expressed. Not "gene diversity" (this is a topic relating to evolutionary fitness, selection potential etc.), not "suppression". Error accumulation.

Which is the exact analogy I made initially.

Etherlord87 311 days ago [-]

I had this conversation before. I point out how your interpretation is insane and doesn't follow logical reasoning, and you accuse me of gaslighting. I don't want to waste anyone else's time. We could just paste to an AI our both initial statements and ask who is more correct, but I'm sure you would either say AIs (all of them or 99% of them) are wrong, or you would interpret them saying I'm more correct, as you being right.

I have no problems being wrong on the Internet. Unfortunately, for some magical reason, in the overwhelming majority of my conversations, I either recognize it within a minute (or one reply when in writing), or never.

mythrwy 311 days ago [-]

Let me give you a simple example maybe you will understand better.

Let's say a person has a recessive faulty gene. The gene doesn't get expressed because there is only one copy (recessive). We can notate this Aa (small "a" being the faulty gene, large "A" being the good copy). The person has two copies because they get one from each parent.

So "Aa" has a partner we can notate as "AA" (two good copies of the gene). AA and Aa have a child. What is the chance the child has the recessive gene? 25% because we have 4 possibilities with 1 bad outcome. Can the child have two bad copies (i.e. "aa" where the gene gets expressed)? No, they cannot because there are not two copies available from the parents, only one. At most they get "Aa". 75% chance they get "AA".

Let's say AA and Aa have a bunch of kids, the kids intermarry. Then their kids intermarry. Now what is the chance of an individual having two bad copies (i.e "aa"). What is the chance they have 1 bad copy (Aa)?

It's just probability calculations, and the expression becomes more probable as there are more copies of the bad gene in the gene pool. I.E within a population, the errors accumulate, they build up, there is a larger chance of getting expression of the defect (aa) with continued inbreeding.

This works with desirable genes too which is why we have so many kinds of dogs for instance. We select for it and build up copies of gene expressions we want to see to the point there is a 100% (or close to) chance of expression.

Hopefully you get this now. If not, read up on Mendelian genetics and table calculations maybe that will help you see.

------------------------

So let me take this back to the original example of LLMs. Suppose there is 1% chance an LLM confidently claims Python library "Foo" exists and does XX when it's not true. This is analogous to a bad copy of the gene. If you train on that output (i.e. "inbreeding"), then use that as a reference (more inbreeding), soon many sources will say "Foo" exists and you'll have a larger chance of getting "Foobarred" information from the LLM.

311 days ago [-]

Chaosvex 333 days ago [-]

Read about model collapse. The TL;DR is garbage in, garbage out.

https://en.wikipedia.org/wiki/Model_collapse

frotaur 333 days ago [-]

Seems o3-mini implements the 'boids' algorithm for flocking (likely due to its prevalence online), but I find that here it doesn't really fit.

Indeed in boids each element has a constant (or minimum) velocity, s.t. the sheep never stop 'running'. I find the Claude flocking behaviour looks more natural, for sheep.

franze 333 days ago [-]

ChatGPT o1 Pro

Demo: https://show.franzai.com/a/clean-parrot-brown (Page will self-destruct after 3 months, feel free to host it somewhere else)

Oneshot Prompt https://chatgpt.com/share/67cff8e6-e218-8009-af5b-d91060eaed...

franze 333 days ago [-]

After some rounds in Cursor using different models

https://show.franzai.com/a/leaf-bug-wasp (LGPT - feel free to fork - Page will vanish in 3 months)

patates 333 days ago [-]

Best attmpt so far IMHO. Very hard though!

stevage 333 days ago [-]

Wow, really impressive.

franze 329 days ago [-]

https://sheep.franzai.com/ now a bit more polished

vnglst 333 days ago [-]

Wow this one is great!

HenryBemis 333 days ago [-]

Tip: don't push them into a corner! I got up to lvl 7 without a problem, and then I got them stuck in a corner and that was it :( Poor sheep will spend the night in the cold outside the barn!

shever73 333 days ago [-]

After nearly 40 years, Shep has finally been released!

See the Crash magazine "Unclear User" parody. Page 125 of the August 1985 edition for context. [0]

[0] https://archive.org/details/Crash_No._19_1985-08_Newsfield_G...

srejk 333 days ago [-]

That you remember this from one month before I was born is incredible.

the_arun 333 days ago [-]

All the demo sites are flagged by Microsoft Edge as - "This site has been reported as unsafe". The irony is the demos are hosted on github pages.

NitpickLawyer 333 days ago [-]

uBO lite as well:

> uBO Lite has prevented the following page from loading:

https://html-preview.github.io/?url=https://raw.githubuserco...

The page was blocked because of a matching filter in OpenPhish Domain Blocklist.

tigerlily 333 days ago [-]

Are they doing this now? Oh brother! And here I was thinking WASM would be a good solution to the desktop exe signing problem for my community's roguelike. Instead browser vendors are likely just going to ban the site.

shakna 333 days ago [-]

Probably related to this recent chaos. [0]

[0] https://www.theregister.com/2025/03/10/infosec_in_brief/

ido 333 days ago [-]

interesting, im also using edge (with all security settings set to maximum) and it works fine for me. Maybe the difference is that I'm using it on mac?

owenpalmer 333 days ago [-]

The one that Claude created was a legitimately fun game! If it implemented boids similar to o3-mini, it would be even better. Slap some sprites on it and put it on steam!

jofzar 333 days ago [-]

I clicked it and went, oh this is actually fun. It feels very early iPhone days mobile game.

matsemann 333 days ago [-]

On desktop the map is huuuuge and it's not particularly fun waiting for them to slowly move all the way to the opposite corner. It's cool that one can prototype this quickly, but needs some tweaks from play-testing as with all games I guess.

boredhedgehog 333 days ago [-]

Claude actually animated the nightfall, unprompted.

I don't think it's fair to say Mistral didn't implement flocking. The force is just very weak.

EDIT: I guess I confused flocking with herding, fair enough.

h4kor 333 days ago [-]

Quiet impressive!

I've build a very similar game for a 3 hour game jam once :D

https://h4kor.itch.io/herding-simulator

matsemann 333 days ago [-]

I once made a boid-thingy, which this also reminds me of. https://matsemann.github.io/boids-workshop/ (and since the parent game is mostly boid behavior with a goal condition, I guess that's why the LLM is so successful in implementing it?)

The link is the final result with lots of controls, but the idea is that it's a tutorial/workshop where you build it step by step yourself, in Norwegian though https://github.com/Matsemann/boids-workshop

oneeyedpigeon 333 days ago [-]

Weird coincidence, but you made the exact same typo the OP made in their prompt. It's "built", not "build" :)

4ndrewl 333 days ago [-]

Ha, I've been creating this on-and-off for a while. Just last night I asked various LLMs to implement a boid-with-predator algorithm and all failed hard.

Instead I spent an hour reading through a description and implementing manually and it at least worked.

But yes, boids is a good start, but it requires some work to make it more natural for mammals, who can have a 0 min speed.

franze 330 days ago [-]

Here an updated version using ChatGPT o1 Pro, Claude, and Cursor https://news.ycombinator.com/item?id=43360648

cainxinth 333 days ago [-]

I finally got around to playing Red Dead Redemption 1 recently and was surprised at how much I enjoyed the cattle driving missions.

viccis 333 days ago [-]

Brought me back to the cowherding missions in RDR. Not a fond memory, but still a memory nonetheless.

unwind 333 days ago [-]

Meta: strange typo in title, "Shepard" should be "Shepherd".

boredhedgehog 333 days ago [-]

Perhaps the shepherd's name is Shepard. :-)

vnglst 333 days ago [-]

As I cannot change the title anymore I think that’s my best option now :-)

pesterazor 333 days ago [-]

Yes but... Why don't you see elephants hiding in trees?

oneeyedpigeon 333 days ago [-]

I wish GitHub would do a:

   pre { text-wrap: wrap; }

panglesd 333 days ago [-]

Interesting experiment.

Would love to see a multiplayer version of this game!

avereveard 333 days ago [-]

https://store.steampowered.com/app/2410820/Too_Many_Sheep/ both cooperative and competitive

SrZorro 333 days ago [-]

Also, https://store.steampowered.com/app/3006280/Sheepherds/

> A cozy co-op party game where you and your sheepdog buddies guide colorful flocks through beautiful landscapes [...]

It has a free demo but no release date yet

tobyhinloopen 333 days ago [-]

That's actually a fun game hah

333 days ago [-]

superflow 333 days ago [-]

The political views of flags is a instant stop for me. I wonder what radobank thinks of that. Keep politics out of tech, and especially if you are in the dutch market, it's such a small market, that you would not want something like that to stop a contract in the future, especially seeing that you user to be a zzp'er.

stijnstijn 333 days ago [-]

> Keep politics out of tech

This is a wild thing to say in 2025.

What 'political views of flags' anyway? I played a few levels and saw no flags, political or otherwise.

rafram 333 days ago [-]

OP has Ukrainian and EU flags on his Mastodon profile, which is linked from his GitHub profile. Apparently that’s not acceptable to GP.

superflow 333 days ago [-]

the reality is, the netherlands is a right leaning country, look at the last election results. A lot of people in the netherlands are very divided about poltics (stop the war, and side with the US, leave the EU, or give more money to ukraine and are pro EU). I hear about this on a daily basis at work. I am quite sick hearing about it. As a hiring manger at one of the big four banks in the netherlands, seeing something like the flags, is a obvious sign as to what side the OP leans towards. Having enough issues on my plate, and not needing another person to start a debate between team members at work is enough for me to stop, and not hire said person. The OP is or was a zzp'er (freelancer) in the netherlands. I think it's a little silly to limit your future contracts by brining in politics into tech. What's the purpose of having it there?

rafram 333 days ago [-]

OP is doing nothing wrong, and if you refused to hire them based on their political views, you would be committing a crime: https://www.government.nl/topics/discrimination/prohibition-...

superflow 332 days ago [-]

and yet, I bet he still would not be hired at most of the large companies.

hackburg 330 days ago [-]

[dead]

Rendered at 09:12:40 GMT+0000 (Coordinated Universal Time) with Vercel.