Next.js App Router + React Server Components Demo

new
past
show
ask
show
jobs
submit

▲OmniHuman-1: Human Animation Models (omnihuman-lab.github.io)

186 points by fofoz 146 days ago | 31 comments

vessenes 146 days ago [-]

These look.. great, by and large. Hands are super natural, coherency is really high. Showing off piano chord blocking is a huge flex.

I’d like to play with this! No code, but bytedance often releases models, so I’m hopeful. It’s significantly better than vasa, and looks likely to be an iteration of that architecture.

liuliu 146 days ago [-]

ByteDance didn't release their text-to-video model, which is the base of this work, so I would think unlikely.

echelon 146 days ago [-]

Tencent is releasing a ton of stuff though!

https://aivideo.hunyuan.tencent.com/

Github is overflowing with Tencent, Alibaba, and Ant Group models. Typically licensed as Apache 2, and replete with pretrained weights and fine tuning scripts.

liuliu 146 days ago [-]

The training process in OmniHuman-1 seems to be straightforward to replicate once Tencent releases their image-to-video model too.

echelon 146 days ago [-]

T2V is already I2V if you're enterprising enough to open up the model and play with the latents. The I2V modality is almost just a trick.

liuliu 146 days ago [-]

Yes, the Llava model can encode image, and you can encode image into 3D vae space. Without fine-tune the model though, you are not going to have fidelity to original (if only use Llava's SigLIP to encode), or end up with image with limited motion (3D vae encoded latents as the first frame then doing vid2vid).

tkgally 145 days ago [-]

The hands do look good, but the spacing of the black and white keys on the piano keyboard is off. Similar problem with the strings and fretboard on the guitar. Such glitches won’t be noticed by as many people as weird hands are, though.

gavinguang 144 days ago [-]

https://omnihuman1.org/ try it

iandanforth 146 days ago [-]

Many of these have tells, but this one fully crossed the uncanny valley for me. https://www.youtube.com/watch?v=1NU8NzvAxEg&t=16s

Good to know that I need to now assume performances are AI generated even if it's not obvious that they are!

smusamashah 146 days ago [-]

What's the tell in this one? https://omnihuman-lab.github.io/video/hands2.mp4 or https://omnihuman-lab.github.io/video/hands1.mp4

mrob 146 days ago [-]

First video: Disappearing and appearing shirt buttons. Disappearing, appearing, and shapeshifting rings. Ear appears to be bluescreened despite the rest of the person appearing to be in front of a real background. Belt buckle slides unnaturally.

Second video: Shadows reveal inconsistent lighting direction. Disappearing and appearing studs on the watch strap. It also has bizarre clothing design with buttons on a non-opening shirt and what seems to be a printed fake weaving pattern that doesn't actually correspond to real weaving, but this could theoretically be made in reality.

lm28469 146 days ago [-]

With the waxy hair and pulsating microphone ?

aylmao 146 days ago [-]

To be fair, the hair looks quite similar to the original: https://www.youtube.com/watch?v=39_OmBO9jVg

marci 146 days ago [-]

On a phone, just scrolling?

smusamashah 146 days ago [-]

What are the tells in most of these videos? I can't point at any in many of them. Hands, teeth, lip sync, body and should movement all look correct. Specially the TED talk like presentation examples near bottom.

thomastjeffery 146 days ago [-]

Try watching them without audio.

They are all yelling. Even the girl with the cat. Too much energy. Too much expression. Too much pause. The pacing is all the same.

smusamashah 146 days ago [-]

This looks better than EMO (also closed source by Alibaba group https://humanaigc.github.io/emote-portrait-alive/). See the rap example on their page. They apparently have EMO2 now which doesn't look as believable to me.

EMO covers head + shoulders while this OmniHuman-1 is covering full body and its looking even better. I would have easily mistaken these for real (specially while doom scrolling) if I was not looking for AI glitches.

UPDATE: Googling animate bytedance site:github.io returns many in the same domain (all proprietry). Found a few good ones.

- https://byteaigc.github.io/X-Portrait2/ Very expressive lifelike portrait animations

- https://byteaigc.github.io/x-portrait/ (previous version of the same, has source https://github.com/bytedance/X-Portrait)

- https://loopyavatar.github.io/ (portrait animations, looks good)

- https://cyberhost.github.io/

- https://grisoon.github.io/INFP/

- https://grisoon.github.io/PersonaTalk/

- https://headgap.github.io/

- https://kebii.github.io/MikuDance/ anime animations

ggerules 146 days ago [-]

This is very good attempt with people playing musical instruments.

But, there are some subtle timing tells, that this is AI generated. Take a look at the singer playing the piano. Timing of the hands with the singer is slightly off. The same goes with the singer and the guitar. I'm not a guitar player or piano player, but I do play a lot of different musical instruments at a high level, and the timing looks off, slightly ahead or behind the actual piece of audio of the piece of music.

stubish 145 days ago [-]

They have cracked lip sync. AI can now realistically show singers strumming away out of time on guitars that are not plugged in or drummers playing the incorrect fills.

mkagenius 146 days ago [-]

> Timing of the hands with the singer is slightly off.

Sure, only way is up though. I haven't seen this level realism in SORA or the google one. Plus, its synced with audio.

latexr 145 days ago [-]

> Ethics Concerns

> The images and audios used in these demos are from public sources or generated by models, and are solely used to demonstrate the capabilities of this research work. If there are any concerns, please contact us (jianwen.alan@gmail.com) and we will delete it in time.

Ethical concerns with this technology have nothing to do with videos on a demo page, and everything to do with what can be generated later.

I don’t know if they have a profound lack of understanding of the ethical implications or are purposefully trying to pretend, but neither is good.

kiwiguy1 146 days ago [-]

I run youtube channels with almost 2 billion views and this actually concerns me. I would love to try this in my productions!!

lamnguyenx 145 days ago [-]

NVIDIA Demo of Audio2Face is such a joke, compared to this one.

egnehots 146 days ago [-]

this could be used as an incredible low bitrate codec for some streaming use cases. (video conferencing/podcasts on <3G for ex, just use some keyframes + the audio).

mpalmer 145 days ago [-]

...I feel slapped by progress. Rarely does such an impressive demo leave me feeling less inspired and hopeful about the future.

emsign 146 days ago [-]

It looks funny.

golol 146 days ago [-]

Modern operating systems should include by default a very simple private/public key system to sign arbitrary files. I think it should not be very complicated? We badly need this in the age of AI.

echelon 146 days ago [-]

That's too much effort and the use cases are what exactly? Helping the prosecution or defense in lawsuits?

People are going to get so used to AI content that it won't really matter. Culture is plastic. This will be the new norm.

Capturing photons to send signals is the new butter churning.

Ajedi32 146 days ago [-]

How would that help?

ssalka 146 days ago [-]

Auto-watermarking of AI generated content, I would imagine

Ajedi32 146 days ago [-]

What does that have to do with signing arbitrary files?

Rendered at 10:50:10 GMT+0000 (Coordinated Universal Time) with Vercel.