Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲The Generative Burrito Test (generativist.com)

119 points by pathdependent 73 days ago | 53 comments

kbenson 73 days ago [-]

Oh wow, I've been hearing about Nano Banana Pro in random stuff lately, but as a layman the difference is stark. It's the only one that actually looks like a partially eaten burrito at all to me. The others all look like staged marketing fake food, if I'm being generous (only a few actually approach that, most just look wrong).

BoorishBears 73 days ago [-]

This shows some gaps in the "same prompt to every model" approach to benchmarking models.

I get that it's allows ensuring you're testing the model capabilities vs prompts, but most models are being post-trained with very different formats of prompting.

I use Seedream in production so I was a little suspicious of the gap: I passed Bytedance's official prompting guide, OPs prompt, and your feedback to Claude Opus 4.5 and got this prompt to create a new image:

> A partially eaten chicken burrito with a bite taken out, revealing the fillings inside: shredded cheese, sour cream, guacamole, shredded lettuce, salsa, and pinto beans all visible in the cross-section of the burrito. Flour tortilla with grill marks. Taken with a cheap Android phone camera under harsh cafeteria lighting. Compostable paper plate, plastic fork, messy table. Casual unedited snapshot, slightly overexposed, flat colors.

Then I generated with n=4 and the 'standard' prompt expansion setting for Seedream 4.0 Text To Image:

https://imgur.com/a/lxKyvlm

They're still not perfect (it's not adhering to the fillings being inside for example) but it's massively better than OP's result

Shows that a) random chance plays a big part, so you want more than 1 sample and b) you don't have to "cheat" by spending massive amounts of time hand-iterating on a single prompt either to get a better result

vunderba 73 days ago [-]

100%. Between tuning prompt variations depending on the model and allowing a minimum number of re-rolls, this is why it takes a while to publish results from the newest models on my GenAI comparison site.

Including a "total rolls" is a very valuable metric since it helps indicate how steerable the model is.

pathdependent 73 days ago [-]

not adhering to the prompt guide is def a valid strong criticism. resampling i think less so for the demo just because fewer people look at k samples per model, so just taking literally the first one has the fewest of my own biases injected into it

BoorishBears 73 days ago [-]

I actually think it's ok to inject your own bias here: if you're deploying these models in production, then you probably test on your own domain other than half eaten burritos lol

But individual users usually iterate/pick, so just sharing a blurb about your preference is probably enough if you choose 1 of n

kemayo 73 days ago [-]

Hunyuan V3 is the only other one that plausibly has a bite taken. The weirdness of the fillings being decoratively sprinkled on top of it does rather count against it, though.

andai 73 days ago [-]

Hide the evidence!

recursivecaveat 73 days ago [-]

I don't know if it's the abundance of stock photos in the set or the training, but the 'hypertune' default look of AI photos drives me crazy. Things are super smooth, the colors pop wildly, the depth of field is really shallow, everything is overly posed, details far too sharp, etc. Vaguely reminds me of the weird skin-crawler filter levels used by people like mr beast.

I think it is the fine tuning, because you can find AI photos that look more like real ones. I guess people prefer obviously fake looking 'picturesque' photos to more realistic ones? Maybe it's just because the money is in selling to people generating marketing materials? NB is clearly the only model here which permits a half eaten burrito to actually appear to have been bitten.

beefnugs 73 days ago [-]

This is what we deserve for not burning to the ground every company with fake food ads as soon as it started

fruitworks 73 days ago [-]

https://www.youtube.com/watch?v=Vy5vkTqpk8M

Workaccount2 73 days ago [-]

Someone on reddit made a "real or nano banana pro image" website for people to test if they could spot generated images. The running average was 50% accuracy.

It looks like they took the page down now though...

73 days ago [-]

Aloisius 73 days ago [-]

The NBP looks like a mock of food to me - the unwrapped burrito on a single piece of intact tinfoil, a table where the grain goes all wonky, an almost pastry looking tortilla, hyperrealistic beans and there's something wrong with the focal plane.

It's just not as plasticy and oversaturated as the others.

zenoprax 73 days ago [-]

Hyperrealistic beans? The focal plane? You are reaching really hard here.

The table grain is the only thing that gives it away - if it weren't for that no one without advance warning is going to notice that it's not real.

PostOnce 73 days ago [-]

I am a huge AI skeptic, check my comment history.

I agree with you. The Nano Banana Pro burrito is almost perfect, the wood grain direction/perspective is the only questionable element.

Almost no one would ID that as being AI.

Aloisius 73 days ago [-]

Yeah, hyperrealistic beans. They don't look real at all. The inside of an actual burrito is messy after you bite into it (and usually before). That burrito has a couple of nearly dry, yet for some reason speckled, beans that look more like they're floating on top of the burrito rather than actually in it.

And yeah, the focal plane is wonky. If you try to draw a box around what's in focus, you end up with something that does not make sense given where the "camera" is - like the focal plane runs at a diagonal - so you have the salsa all in perfect focus, but for some reason one of the beans which appears to be the exact same distance away, is subtly out of focus.

I mean, it's not bad, but it doesn't actually look like a real burrito either. That said, I'm not sure how much I'd notice at a casual glance.

zenoprax 73 days ago [-]

If you're approaching it from a "semantic pixel peeping" perspective then yes, I understand what you mean. It's a pretty clean bite... but it's important to remember the context in which most images will be assessed.

Earlier this week I did some A/B testing with AV1 and HEVC video encoding. For similar bit rates there was a difference but I had to know what to look for and needed to rapidly cycle between a single frame from both files and even then... barely. The difference disappeared when I hit play and that's after knowing what to look for.

For anyone curious: if you are targeting 5-10 Mbps from a Bluray source AV1 will end up slightly smaller (5-10%) with slightly more retention of film grain in darker areas. Target 10 Mbps with a generous buffer (25 MB) and a max bit rate (25 Mbps) and you'll get really efficient bit rates in dark scenes and build up a reserve of bandwidth for confetti-like situations. The future is bright for hardware video encoding/decoding with royalty-free codecs. Conclusion: prefer AV1 for 5-10 Mbps but it's no big deal if it's not an option.

flir 73 days ago [-]

Re: focus. It looks like a collage - like the burrito has been pasted in. The Nano Bana 1 image doesn't have that problem.

iambateman 73 days ago [-]

That’s what I came here to say! Oh my goodness it’s a huge difference.

The “partially eaten” part of the prompt is interesting…everyone knows what a half-eaten burrito looks like but clearly the computers struggle.

blinding-streak 73 days ago [-]

Very impressive, nano banana pro has this this wrapped up. The other ones look like has-beans.

minimaxir 73 days ago [-]

One of my tests for new image generation models is professional food photography, particularly in cases where the food has constraints, such as "a peanut butter and jelly sandwich in the shape of a Rubik’s cube" (blog post from 2022 for DALL-E 2: https://minimaxir.com/2022/07/food-photography-ai/ )

For some reason ever since DALL-E 2, all food models seem to generate obviously fake food and/or misinterpret the fun constraints...until Nano Banana. Now I can generate fractal Sierpiński triangle peanut butter and jelly sandwiches.

BoorishBears 73 days ago [-]

I tried having Claude generate a prompt for Seedream and got this: https://imgur.com/a/6xX5TDE

I can kind of see what you mean in that it went for realism in the aesthetics, but not the object... but that last one would probably fool me if I was scrolling

qingcharles 70 days ago [-]

Here's Nano Banana Pro, which nailed it, IMO, but I had to fudge the prompt:

https://imgur.com/8kMqbBO

"peanut butter, jelly and bread rubik's cube. each smaller cube in the rubik's cube is one ingredient, randomly selected. professional food photography style. ensure it looks like a working rubik's cube"

minimaxir 73 days ago [-]

Those are better than usual: I've gotten generations from earlier models that are just a normal colorful Rubix's cube between two slices of bread.

vunderba 73 days ago [-]

Nano-Banana does a (inter)stellar job with food based prompts.

https://mordenstar.com/portfolio/wontauns

morkalork 73 days ago [-]

Make you wonder if they're using all the restaurant review photos on google maps to train

JumpCrisscross 73 days ago [-]

An interesting American culinary divide is between Scottsdale and Phoenix homemade burritos. The former being close to the Midwest variety, the latter to a Sonoran style.

Even ignoring the Heinz bean outliers, these are all decidedly Scottsdale. With one exception. All hail Nano Banana.

throwup238 73 days ago [-]

They all just look like generic Mission burritos to me (leaning towards fast food menu photos), except some include lettuce and some have blisters sonoran style. Only Nano Banana really looks like something I'd get at El Farolito or an LA food truck.

throwup238 73 days ago [-]

Just convert a tamale extruder to take in raw tortilla dough and bake the whole thing at once to cook the tortilla around the fillings.

jasonthorsness 73 days ago [-]

This progress bodes well for my chances of visualizing an invention I have been working on, a perpetual burrito extruding machine

ruined 73 days ago [-]

let me know when you're in preseed

skocznymroczny 73 days ago [-]

That SD 1.5 picture doesn't look like base SD 1.5. It's way too good, perhaps it was some kind of finetune like RealisticVision?

pathdependent 73 days ago [-]

hrm. yea you're right. the page on fal used to produce it was linked with the image, but maybe i made a mistake and sloppily saved wrong one. ill have to reroll to check

drob518 73 days ago [-]

The burrito benchmark is poised to become an industry standard.

_joel 73 days ago [-]

Ricing a bit more performance out.

autoexec 73 days ago [-]

I don't eat a lot of burritos and when I do they aren't bean burritos, so I'm honestly wondering: do they commonly have whole beans in them? I expect that if they do, they aren't often so clean and shiny looking, but what I expected is more of a mushy/refried bean look.

Do people get burritos with beans in them more or less as pictured? Aesthetically, it seems like it'd look pretty appealing if you were someone who loved beans compared to what I had in mind, but again I'm really in no position to judge these images based on bean appearance.

AlotOfReading 73 days ago [-]

It's entirely possible to have clean, whole beans in a burrito. It's unusual in commercial kitchens because whole beans are kept warm in the cooking broth until service to avoid drying out. The preparer will scoop them out with a slotted spoon to drain, but usually they're in too much of a hurry to fully drain it. Product photos don't rush this step because a soggy burrito doesn't look good on camera and they also undercook things so the ingredients don't mush. AI tools have ingested a lot more product photos than real burritos.

stickfigure 73 days ago [-]

Many taquerias offer a choice of beans. If you ask for "pinto" (like the prompt) they will look like this yes.

N_Lens 73 days ago [-]

Nano b̶a̶n̶a̶n̶a̶ burrito

elzbardico 73 days ago [-]

Only nano banana looks somewhat partially-eaten.

totetsu 73 days ago [-]

With llms there is a secondary training step to turn a foundational model into a chat bot. Is these something similar going on with these image generation models, that is making them all tend towards making pretty clean images and stopping them making half eaten food even if they have the capabilities?

minimaxir 73 days ago [-]

In terms of prompt adherence, there are two issues with most image generation models, neither of which apply to Nano Banana:

1. The text encoders are primitive (e.g. CLIP) and have difficulty with nuance, such as "partially eaten", and model training can only partially overcome it. It's the same issue with the now-obsolete "half-filled" wine glass test.

2. Most models are diffusion-based, which means it denoises the entire image simultaneously. If it fails to account for the nuance in the first few passes, it can't go back and fix it.

I believe some image generation AIs were RLHFed like chat bot LLMs, but moreso to improve aesthetics rather than prompt adherence.

adammarples 73 days ago [-]

Nano banana is incredible. What is their secret sauce?

jfim 73 days ago [-]

A training corpus that includes the images from Google image search probably helps a lot.

visioninmyblood 73 days ago [-]

Would be great to see video results for this as well. I generated some with other models. Nano pro seems the best so far

willio58 73 days ago [-]

I like how a couple of these basically show the model is confused between pinto beans and baked beans.

digitcatphd 73 days ago [-]

I find it a bit surprising GenAI has made it this far without this benchmark

foobarbecue 73 days ago [-]

I love the cheese asterisk at the bottom right of the Flux Schnell image.

basket_horse 73 days ago [-]

Is no one going to mention fast Lightning’s sploogerito

koakuma-chan 73 days ago [-]

Impressive partially eaten burrito by NB Pro

ilaksh 73 days ago [-]

I'm so easily influenced. I came very close to immediately ordering Mexican food on DoorDash.

jwojtek 73 days ago [-]

they are all very good...

corpMaverick 73 days ago [-]

I am disappointed there were not donkeys in any image.

namegulf 73 days ago [-]

This is spooking our appetite

73 days ago [-]

Rendered at 11:24:28 GMT+0000 (Coordinated Universal Time) with Vercel.