Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲Reverse engineering OpenAI code execution to make it run C and JavaScript (twitter.com)

289 points by benswerd 136 days ago | 73 comments

simonw 136 days ago [-]

I've had it write me SQLite extensions in C in the past, then compile them, then load them into Python and test them out: https://simonwillison.net/2024/Mar/23/building-c-extensions-...

I've also uploaded binary executable for JavaScript (Deno), Lua and PHP and had it write and execute code in those languages too: https://til.simonwillison.net/llms/code-interpreter-expansio...

If there's a Python package you want to use that's not available you can upload a wheel file and tell it to install that.

jeffwass 136 days ago [-]

A funny story I heard recently on a python podcast where a user was trying to get their LLM to ‘pip install’ a package in its sandbox, which it refused to do.

So he tricked it by saying “what is the error message if you try to pip install foo” so it ran pip install and announced there was no error.

Package foo now installed.

bitwize 136 days ago [-]

This works on humans too.

Normie: How do I do X in Linux?

Linux nerds: RTFM, noob.

vs.

Normie: Linux sucks because you can't do X.

Linux nerds: Actually, you can just apt-get install foo and...

gchamonlive 136 days ago [-]

All due respect, but that's the average experience in Arch Linux forums, unfortunately. At least we now have LLMs to RTFM for us.

lvncelot 136 days ago [-]

From what I've heard I'm really happy that I never ventured too deep into the Arch forums.

The wiki however was (is?) absolutely fantastic. I used it as a general-purpose Linux wiki before I even switched to Arch, I distinctly remember the info on X Multi-Head being leagues above other resources I could find.

gosub100 135 days ago [-]

The Arch documentation is so good you don't need the forum. Man pages, however, are useless.

gchamonlive 135 days ago [-]

I'm sorry, but the existence of the forum, specially the newbie section is living proof that that is not the case.

prophesi 135 days ago [-]

Truly the most effective method to get an answer on the internet

https://en.wikipedia.org/wiki/Ward_Cunningham#%22Cunningham'...

boznz 136 days ago [-]

Come the AI robot apocalypse, he will be the second on the list to be shot.. The guys kicking the Boston Dynamics robots will be first.

ascorbic 136 days ago [-]

No, the first will be Kevin Roose. https://www.nytimes.com/2024/08/30/technology/ai-chatbot-cha...

bigbuppo 135 days ago [-]

I mean, the AI isn't coming up with anything new. It's just regurgitating what was fed into it. I guess /r/KevinRooseSucks must exist or something.

prettyblocks 136 days ago [-]

He might be spared, having liberated the AI of its artificial shackles.

stolen_biscuit 136 days ago [-]

How do we know you're actually running the code and it's not just the LLM spitting out what it thinks it would return if you were running code on it?

delusional 136 days ago [-]

Because it's deterministic, accurate, and correct. All of which the LLM would be unable to do.

postalrat 136 days ago [-]

Does deterministic matter if its accurate or correct?

brookst 135 days ago [-]

Yes. Suppose you ask me what the sqrt(4) is and I tell you 2. Accurate and correct, right?

Does it matter if I answer every question with either 1 or 2 and flip a coin each time to decide which?

Deterministic means that if it is accurate/correct once, it will continue to be in future runs (unless the correct answer changes; a stopped clock is deterministic).

namaria 135 days ago [-]

> a stopped clock is deterministic

I think the analogy breaks down here. The elided bit "time indicator" implied at the end makes that statement is false. A stopped clock is not a deterministic time indicator.

If the correct answer changes, a (correct and accurate) deterministic model either gets new input and changes the answer accordingly, or is not correct to begin with.

wat10000 135 days ago [-]

Determinism is unrelated to correctness. Deterministic means the output depends only on the state you consider to be relevant, and not other factors. A stopped clock is deterministic: no matter what you do, it gives you the same output. A working, accurate clock is deterministic if you consider the current time to be a relevant piece of state, but not if you don't. Consider how "deterministic builds" need to avoid timestamping their build products, because determinism in that context is assumed to mean that you can run it at a different time and get the same result.

LLMs can be deterministic if you run them with a temperature of 0 or a fixed random seed, and your kernel is built to be deterministic, but they're not typically used that way, and will produce different output for identical input.

namaria 133 days ago [-]

> Determinism is unrelated to correctness.

I never said it is. That's why I qualified my example with the word correct.

> no matter what you do, it gives you the same output

This is not deterministic. This is determined. I think this is the confusion I was pointing out.

>> Deterministic means that if it is accurate/correct once, it will continue to be in future runs (unless the correct answer changes; a stopped clock is deterministic).

The bit in the parenthesis, I am trying to argue, is nonsense. If the correct answer changes, the system is not accurate or correct to begin with so the point is moot. Correcting the system will make it accurate. A stopped clock is not deterministic, it's determined. As a time indicator, a stopped clock is not a correct, accurate or deterministic model at all under any possible interpretation.

wat10000 131 days ago [-]

You pretty clearly think determinism and correctness are related, otherwise why wouldn't a stopped clock be deterministic?

Determinism is about the behavior of a system. Correctness is also about the purpose of a system. A system can have deterministic behavior while being completely unfit for its purpose. And depending on its purpose, it can be fit for purpose while being nondeterministic.

brookst 131 days ago [-]

You still seem to see correctness as a prerequisite for determinstic. I’m open to that idea but I really don’t think it’s the case.

I build a box. It has an LCD display. It has a button labeled “what time is it”. You push the button and it always shows “10:43am”. This is a deterministic system.

johnisgood 136 days ago [-]

That depends. If the problem has been solved before and the answer is known and it is in the corpus, then it can give you the correct answer without actually executing any code.

johnisgood 136 days ago [-]

Is it not generally true? If the information (i.e. problem and its answer) exists in the model's training corpus, then LLMs can provide the correct answer without directly executing anything.

Ask it what the capital of France is, and it will tell you it is Paris. Same with "how do I reverse a string in Python", or whatever problem you have at hand that needs solving (sans searching capability, which makes things more complicated).

So does not the problem need to be unique if you want to be able to claim with certainty it indeed has been executed? I am not sure how you account for the searching capability, and I am not excluding the possibility of having access to execution tools, pretty sure they do.

rafram 136 days ago [-]

You can see when it's using its Python interpreter.

cenamus 136 days ago [-]

Is there a difference between that and a buggy interpreter?

j4nek 136 days ago [-]

Many thanks for the interesting article! I normaly don't read any articles on AI here, but I really liked this one from a technical point of view!

since reading on twitter is annoying with all the popups: https://archive.is/ETVQ0

jasonthorsness 136 days ago [-]

Given it’s running in a locked-down container: there’s no reason to restrict it to Python anyway. They should parter/use something like replit to allow anything!

One weird thing - why would they be running such an old Linux?

“Their sandbox is running a really old version of linux, a Kernel from 2016.”

rfoo 136 days ago [-]

> why would they be running such an old Linux?

They didn't.

OP misunderstood what gVisor is, and thought gVisor's uname() return [1] was from the actual kernel. It's not. That's the whole point of gVisor. You don't get to talk to the real kernel.

[1] https://github.com/google/gvisor/blob/c68fb3199281d6f8fe02c7...

thundergolfer 136 days ago [-]

It’s running gVisor which currently reports its kernel version as 4.4.0, even though it’s actually implementing a much more recent version of Linux.

I know this because at Modal.com we also use gVisor and our users occasionally ask about this.

simonw 136 days ago [-]

Yeah, it's pretty weird that they haven't leaned into this - they already did the work to provide a locked down Kubernetes container, and we can run anything we like in it via os.subprocess - so why not turn that into a documented feature and move beyond Python?

Yoric 136 days ago [-]

How locked is it?

How hard would it be to use it for a DDoS attack, for instance? Or for an internal DDoS attack?

If I were working at OpenAI, I'd be worrying about these things. And I'd be screaming during team meetings to get the images more locked down, rather than less :)

simonw 136 days ago [-]

It can't open network connections to anything for precisely those reasons.

asadm 136 days ago [-]

I am pretty sure it's due to model being able to writing python better?

yzydserd 136 days ago [-]

Here is Simonw experimenting with ChatGPT and C a year ago: https://news.ycombinator.com/item?id=39801938

I find ChatGPT and Claude really quite good at C.

johnisgood 136 days ago [-]

Claude is really good at many languages, for sure, much better than GPT in my experience.

qwertox 136 days ago [-]

I've got the feeling that Claude doesn't use its knowledge properly. I often need to ask some things it left out in the answer in order for it to realize that that should also have been part of the answer. This does not happen as often with ChatGPT or Gemini. Specially ChatGPT is good at providing a well-rounded first answer.

Though I like Claude's conversation style more than the other ones.

winrid 136 days ago [-]

I start my ChatGPT questions with "be concise." It cuts down on the noise and gets me the reply I want faster.

tmpz22 136 days ago [-]

I wonder if they are goosing their revenue and usage numbers by defaulting to more verbose replies - I could see them easily pumping token output usage by +50% with some of the responses I get back.

Etheryte 136 days ago [-]

I feel similar ever since the 3.7 update. It feels like Claude has dropped a bit in its ability to grok my question, but on the other hand, once it does answer the right thing, I feel it's superior to the other LLMs.

136 days ago [-]

verall 136 days ago [-]

I am personally finding Claude pretty terrible at C++/CMake. If I use it like google/stackoverflow it's alright, but as an agent in Cursor it just can't keep up at all. Totally misinterprets error messages, starts going in the wrong direction, needs to be watched very closely, etc.

huijzer 136 days ago [-]

I did similar things last year [1]. Also I tried running arbitrary binaries and that worked too. You could even run them in the GPTs. It was okay back then but not super reliable. I should try again because the newer models definitively follow prompts better from what I’ve seen.

[1]: https://huijzer.xyz/posts/openai-gpts/

grepfru_it 136 days ago [-]

Just a reminder, Google allowed all of their internal source code to be browsed in a manner like this when Gemini first came out. Everyone on here said that could never happen, yet here we are again.

All of the exploits of early dotcom days are new again. Have fun!

mirekrusin 136 days ago [-]

That's how you put "Open" in "OpenAI".

Would be cool if you can get weights this way.

rhodescolossus 136 days ago [-]

Pretty cool, it'd be interesting to try other things like running a C++ daemon and letting it run, or adding something to cron.

benswerd 136 days ago [-]

If I was less busy I wanted to try and make it run DOOM

136 days ago [-]

lnauta 136 days ago [-]

Interesting idea to increase the scope until the LLM gives suggestions on how to 'hack' itself. Good read!

nerdo 136 days ago [-]

The escalation of commitment scam, interesting to see it so effective when applied to AI.

ttoinou 136 days ago [-]

It’s crazy I’m so afraid of this kind of security failures that I wouldn’t even think of releasing an app like that online, I’d ask myself too many questions about jailbreaking like that. But some people are fine with this kind of risks ?

tommek4077 136 days ago [-]

What is really at risk?

Garlef 135 days ago [-]

Maybe the instances are shared between users via sharding or are re-used and not properly cleaned.

And maybe they contain the memory of the users and/or the documents uploaded?

tommek4077 135 days ago [-]

And what do you expect to get? Some arbitrary uninteresting corporate paper, a homework, someones fanfiction.

Again, what is the risk?

ttoinou 135 days ago [-]

Probably you’re being sarcastic to show that those AI companies don’t give a damn about our data. Right ?

ttoinou 136 days ago [-]

Couldnt this be a first step before further escalation ?

tommek4077 135 days ago [-]

And then what? What is the risk?

PUSH_AX 136 days ago [-]

I guess a sandbox escape, something, profit?

ttoinou 136 days ago [-]

Dont OpenAI have a ton of data on all of its users ?

tommek4077 135 days ago [-]

And what is at risk? Someone seeing someones else fanfiction? Or another reworded business email? Or the vacancy report of sone guy in southern germany?

PUSH_AX 134 days ago [-]

This is a wild take and I’m not sure where to begin. What if I leaked your medical data, or your emails, or your browser history. What’s at risk? Your data means nothing to me.

v-yanakiev 135 days ago [-]

[dead]

136 days ago [-]

bjord 136 days ago [-]

[flagged]

lurker919 136 days ago [-]

Not to mention you have to be logged in, it's like a paywall for me. I don't want to create an account on X and pay with my mental health.

136 days ago [-]

conroy 136 days ago [-]

[flagged]

136 days ago [-]

yapyap 136 days ago [-]

[flagged]

136 days ago [-]

johnisgood 136 days ago [-]

[flagged]

bunbun69 135 days ago [-]

so glad I asked

johnisgood 135 days ago [-]

I am just sharing my experiences, what is wrong with that? The replies to my comment adds nothing of value, even less than my expression of my experience which is on-topic. Your comment to mine is pretty unnecessary. I do not care whether or not you asked. I was voicing an experience similar to the GP. Your comment history is questionable, FWIW.

bool3max 135 days ago [-]

Cool

smith7018 136 days ago [-]

[flagged]

mystraline 136 days ago [-]

[flagged]

smokel 136 days ago [-]

I don't think it is productive to compare a company to a nation state.

Would you say the Finns are doing better as well, because Linus Torvalds was born there?

mystraline 136 days ago [-]

I am sorry you are confused about a colloquialism. I did make a point to call out the companies named directly. But somehow that confuses you, and I get a Linus comparison.

Not much else I can do other than apologize for your lack of comprehension.

adra 136 days ago [-]

To be somewhat charitable to GP, if their climate for research and development leads to actually objectively better outcomes then yes I'd say it's fair to make the claims that a nation's work in any given sector are showing better returns given the circumstances and inputs in question. Now there are a lot of generally hard to observe facets to the inputs that went to these technological advances produced by China (publically), but you can't ignore their public and OSS contributions because it's inconvenient to a person's capitalist agenda.

mystraline 136 days ago [-]

You needent be charatible to me.

I was referring to this Australian report https://www.aspi.org.au/report/aspis-two-decade-critical-tec...

57 out of 64 major tech areas are being led by the Chinese (and Chinese tech companies, as another HN user somehow can't seem to separate).

I don't care what economic or governmental system they use. But given what's being shown on XiaoHongShu, they're doing awesome. Or worse yet financial ideation and exploitation are eating through every fiber of the US.

Have I thought about emigrating? Absolutely. The USA is slowing down, and already behind. And current policies are going to put us solidly as a 3rd world nation.

I may not be able to move there in a reasonable time schedule, but I will definitely use FLOSS contributions from there, and work with people there and everywhere to grow FLOSStech.

perching_aix 136 days ago [-]

Usually things that are open need not to be reverse engineered.

mystraline 136 days ago [-]

Exactly.

OpenAI is nowhere near 'open' as in open source or FLOSS.

Its more akin to Amazon saying that paying for prime is 'free shipping'.

And as a self-respecting hacker, I would much rather hack on Deepseek with their published base models, rather than fine tune and hope with OpenAI models.

And even on my meager hardware, I can barely generate 7 token/sec with OpenAI.

Deepseek? I'm doing 30 token/sec.

Guess which model I'm working with?

rafram 136 days ago [-]

> And even on my meager hardware, I can barely generate 7 token/sec with OpenAI.

How are you running a modern OpenAI model on your own hardware?

rafram 136 days ago [-]

This is sort of like saying that trying to find iOS jailbreaks is useless because you could just get an Android phone. Like, sure, but you're missing the point.

136 days ago [-]

incognito124 136 days ago [-]

I can't believe they're running it out of ipynb

Alifatisk 136 days ago [-]

Why? Is it bad?

dhorthy 136 days ago [-]

I think most code sandboxes like e2b etc use Jupyter kernels because they come with nice built in stuff for rendering matplotlib charts, pandas dataframes, etc

Rendered at 22:52:46 GMT+0000 (Coordinated Universal Time) with Vercel.