For context, I was attempting to put a cup of hot chocolate into the hands of an anime character.
amy214 2 days ago [-]
I had the same experience on their site, even following the given example! "Show me a horse" -> a brown horse appears. "Make it black" -> now it has black fur. "Make it white" -> I'm sorry, racist content is not permitted, you are a bad person
summerlight 2 days ago [-]
Yeah, I'm tired with all those nanny things. Why is Google so obsessed with filtering? They even got all of reputational damages from unnecessary filtering and manipulation, not from unsafe output. I'm pretty sure they've lost 20~30% of potential users due to this crazy level of censorship and it looks like they don't care.
acters 15 hours ago [-]
Local hosting and community support I will always offer more freedom than the rather poor filtering that big tech loves to do. Like why not offer option to pay for copyright content? Seriously, if they can identify it is copyright then they can allow to pay for it, but nope they rather intentionally filter out all of it and end up with useless product that will eventually be phased out.
nmfisher 3 days ago [-]
I had exactly the same issue, seems like it's practically useless for humanoid illustrations/characters.
megadata 3 days ago [-]
I think it's definition on what's a "person" is way too broad. I've had similar problems, it's pretty shitty.
Pixelwave generates much better horse-people, in the event I wanted that. Admittedly it doesn't have an edit function; unfortunately the illustrations I want to edit all have people in them.
Ever since OpenAI showed (but did not release) this type of multimodal output with 4o, I have been waiting for this to be available to the general public.
It seems like really combining visuals at the level of generation capability means language understanding is fully grounded in a richer world model.
I am hoping for a step up in real world common sense intelligence areas like those covered by SimpleBench. Although they are static images, so there might still be room for improvement ad far as physics understanding.
Also, if they can get it to the point of really accurate (probably larger models), this unlocks whole industries in terms of being able to do useful work.
londons_explore 3 days ago [-]
If it can do diagrams, charts, etc, with any kind of accuracy, it would have far more impact.
Eg.
"I suggest moving the boiler from point A to B on the below map of the factory to reduce piping costs and heat loss"
Curious that they use "Use Gemini 2.0 Flash to tell a story and it will illustrate it with pictures, keeping the characters and settings consistent throughout", and their example is not consistent at all between two pictures.
jcuenod 3 days ago [-]
I was really hoping that there would be more character consistency, given the fact they mention it in the blog. It also doesn't seem to reliably follow styles like "watercolor illustration" or "line and wash".
3 days ago [-]
Rendered at 15:52:38 GMT+0000 (Coordinated Universal Time) with Vercel.
For context, I was attempting to put a cup of hot chocolate into the hands of an anime character.
https://usercontent.irccloud-cdn.com/file/DkQ5SSdT/image.png
Pixelwave generates much better horse-people, in the event I wanted that. Admittedly it doesn't have an edit function; unfortunately the illustrations I want to edit all have people in them.
Realistic photos work better, though it still doesn't beat Flux.1-dev: https://usercontent.irccloud-cdn.com/file/ZsouXNpn/image.png
It seems like really combining visuals at the level of generation capability means language understanding is fully grounded in a richer world model.
I am hoping for a step up in real world common sense intelligence areas like those covered by SimpleBench. Although they are static images, so there might still be room for improvement ad far as physics understanding.
Also, if they can get it to the point of really accurate (probably larger models), this unlocks whole industries in terms of being able to do useful work.
Eg. "I suggest moving the boiler from point A to B on the below map of the factory to reduce piping costs and heat loss"
https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...
Previous try with some interesting introspection:
https://drive.google.com/file/d/1SCBbpDo1dAJBAz7bFABk4yBZBuz..., https://aistudio.google.com/app/prompts?state=%7B%22ids%22:%...