Content delivery costs a lot for streaming services. After content is produced, this is basically the only remaining cost. It’s not surprising that they would go to extreme measures in reducing bitrate.
That’s why, presumably, Netflix came up with the algorithm for removing camera grain and adding synthetically generated noise on the client[0], and why YouTube shorts were recently in the news for using extreme denoising[1]. Noise is random and therefore difficult to compress while preserving its pleasing appearance, so they really like the idea of serving everything denoised as much as possible. (The catch, of course, is that removing noise from live camera footage generally implies compromising the very fine details captured by the camera as a side effect.)
1. camera manufacturers and film crews both do their best to produce a noise-free image
2. in post-production, they add fake noise to the image so it looks more "cinematic"
3. to compress better, streaming services try to remove the noise
4. to hide the insane compression and make it look even slightly natural, the decoder/player adds the noise back
Anyone else finding this a bit...insane?
trenchpilgrim 2 days ago [-]
> camera manufacturers and film crews both do their best to produce a noise-free image
This is not correct, camera manufacturers and filmakers engineer _aesthetically pleasing_ noise (randomized grains appear smoother to the human eye than clean uniform pixels). The rest is still as silly as it sounds.
franga2000 2 days ago [-]
Considering how much many camera brands boast their super low noise sensors, I'd still say a very common goal is to have as little noise as possible and then let the director/dop/colorist add grain to their liking. Even something like ARRI's in-camera switchable grain profiles requires a low-noise sensor to begin with.
But yes, there are definitely also many DPs that like their grain baked-in and camera companies that design cameras for that kind of use.
strogonoff 1 days ago [-]
In any case, luma noise is not at all a massive issue, and it is a mistake to say that crews do their best to produce a noise-free image. They do their best to produce an image that they want to see, and some amount of luma noise is not a deal-breaker. There are routinely higher priorities that will take over using the lowest ISO possible. They can also be financial considerations, if you don’t have enough lights.
It is only an issue in content delivery.
benjiro 2 days ago [-]
> randomized grains appear smoother to the human eye than clean uniform pixels
Does this explain why i dislike 4K content on a 4K TV? Where some series and movies look too realistic, what in turn gives me a amateur film feeling (like somebody made a movie with a smartphone).
ZeroGravitas 2 days ago [-]
This sounds like what is called the soap opera effect:
Which is generally associated with excess denoisong rather than with excess grain.
dataflow 2 days ago [-]
Are you sure? That seems to be about motion interpolation. Not even about smooth motion, but about it being interpolated. Here the concern seems to be about how the individual, still images look, not anything about the motion between them.
ZeroGravitas 2 days ago [-]
> some series and movies look too realistic, what in turn gives me a amateur film feeling
This comment that I replied to is almost a textbook description of the soap opera effect.
The interpolation adds more FPS, which is traditionally a marker of film vs TV production.
ycombinatrix 2 days ago [-]
Avatar 2 was particularly egregious with the poor interpolation
weaksauce 2 days ago [-]
i think i've seen this effect on tv shot on cameras that were lower fps than the tv outputs. looks fake and bad and interpolated because it is.
mjevans 2 days ago [-]
Yes, just stop doing step 2 the way they're doing and instead if they _must_ do noise modify parameters for step 4 directly.
strogonoff 2 days ago [-]
> 1. camera manufacturers and film crews both do their best to produce a noise-free image 2. in post-production, they add fake noise to the image so it looks more "cinematic"
This is patently wrong. The rest builds up on this false premise.
franga2000 2 days ago [-]
1.1: Google some low-light performance reviews of cinema cameras - you'll see that ISO noise is decreasing with every generation and that some cameras (like from Sony) have that as a selling feature.
1.2.: Google "how to shoot a night scene" or something like that. You'll find most advice goes something along the lines of "don't crank the ISO up, add artificial lighting to brighten shadows instead". When given a choice, you'll also find cinematographers use cameras with particularly good low-light performance for dark scenes (that's why dark scenes in Planet Earth were shot on the Sony A7 - despite the "unprofessional" form factor, it had simply the best high-ISO performance at the time)
2: Google "film grain effect". You'll find a bunch of colorists explaining why film grain is different from ISO noise and why and how you should add it artificially to your films.
strogonoff 2 days ago [-]
I am reasonably confident that I know this subject area better than one could learn from three Google searches. “ISO noise” is not even a real term; you are talking about digital sensor noise, which can further be luminance noise or colour noise. Opinions about luma noise being “unaesthetic” are highly subjective, depending a lot on stylistic choices and specific sensor, and some would say luma noise on professional sensors has not been ugly for more than a decade. Generally, the main reason I don’t think your comments should be taken seriously is because you are basically talking only about a subset of photography, digital photography, while failing to acknowledge that (works are still shot on film in 2025 and will be in years to come).
franga2000 2 days ago [-]
"ISO noise" is a thing I've heard on a few different film sets and it's a convenient shorthand for "the noise that becomes really apparent only when you crank up the gain". Now that I'm thinking about it, there's a good chance this is more of a thing where I'm from, since our native language doesn't have different words for grain and noise, so we have to diffentiate between them with a suffix, which we then incorrectly also use in English. I guess on a film set with mostly native English speakers, noise and grain is clear enough.
Next, no shit aesthetics are subjective, I never said this is the one objective truth. I said this is a thing that many people believe, as evidenced by the plethora of resources talking about the difference between noise and grain, why tasteful grain is better than a completely clean image and how to add it in post.
And finally, come on, it's obvious to everyone in this thread that I'm referring to digital, which is also not "just a subset" it's by far the biggest subset.
So idk what your point is. Most things are shot digitally. Most camera companies try to reduce sensor noise. Most camera departments try to stick to the optimal ISO for their sensor, both for dynamic range and for noise reasons, adjusting exposure with other means. In my experience, most people don't like the look of sensor/gain/iso/whatever noise. Many cinematographers and directors like the look of film grain and so they often ask for it to be added in post.
Besides the many/most/some qualifiers possibly not matching with how you percieve this (which is normal, we're all different, watch different content, work in different circles...), where exactly am I wrong?
strogonoff 1 days ago [-]
I can assure you that “ISO noise” is not a real term. I would take the word of whoever uses it with a grain of salt, movie set or not. Words have meanings.
> it's obvious to everyone in this thread that I'm referring to digital
It was blindingly obvious that you meant digital. That’s why I pointed this out. Without mentioning that it is only a concern with digital photography, your points become factually incorrect on more than one level; because the thread wasn’t talking specifically about digital photography, some of your points about noise don’t apply even if they were correct—which they aren’t, by your own admission that photography is subjective. Producing a noise-free image is not the highest priority for film crews (for camera manufacturers it is, but that’s because it means more flexibility in different conditions; it does not mean film crews will always prioritize whatever settings give them the lowest noise, there are plenty of higher priorities), and in some cases they choose to produce image with some noise despite the capability to avoid it.
Sorry, with your googling suggestions it just reads like a newbie’s take on subject matter.
amy_petrik 2 days ago [-]
>Generally, the main reason I don’t think your comments should be taken seriously is because you are basically talking only about a subset of photography, digital photography, while failing to acknowledge that
TLDR - because although you made valid points on a specific area, you made no acknowledgement to my own favorite specific area, thusly I shall publicly declare your valid points shall not be taken seriously for others to read
strogonoff 1 days ago [-]
No need to put words in my mouth. First, the points were made in context of photography as a whole, and in that context, without specifying that they only apply in digital, they are false. Second, even in digital they are false. That’s all.
Yes, I was not strictly correct, it is a feature of AV1, but Netflix played an active role in its development, in rolling out the first implementation, and in AV1 codec development overall.
astrange 2 days ago [-]
Prior codecs all had film grain synthesis too (at least back to H264), but nobody used it. Partly because it was obviously artificial and partly because it was too expensive to render, since you had to copy each frame to apply effects to it.
tverbeure 2 days ago [-]
AFAIK, FGS support is the exception instead of the rule.
h.264 only had FGS as an afterthought, introduced years after the spec was ratified. No wonder it wasn’t widely adopted.
VP9, h.265 and h.266 don’t have FGS.
astrange 2 days ago [-]
Did they remove it from the specs? Wouldn't surprise me. But since it's just extra metadata and doesn't affect encoding you could still use the old spec.
tverbeure 2 days ago [-]
I don’t think it was ever part of the spec. It would surprise me if it was, because once a spec has been finalized and voted on, changing it is complicated. Getting agreement the first time is already difficult enough.
dan-robertson 2 days ago [-]
It feels to me like there are two different things going on:
1. Video codecs like the denoise, compress, synthetic grain approach because their purpose is to get the perceptually-closest video to the original with a given number of bits. I think we should be happy to spend the bits on more perceptually useful information. Certainly I am happy with this.
2. Streaming services want to send as few bytes as they can get away with. So improvements like #1 tend to be spent on decreasing bytes while holding perceived quality constant rather than increasing perceived quality while holding bitrate constant.
I think one should focus on #2 and not be distracted by #1 which I think is largely orthogonal.
astrange 2 days ago [-]
For #1 the problem with keeping grain in the compressed video is that it doesn't follow the motion of the scene so it makes it much more expensive to code future frames.
strogonoff 2 days ago [-]
I disagree, because 1) complete denoising is simply impossible while preserving fine detail and 2) noise is a serious artistic choice—just like anamorphic flare, lens FOV with any distortion artifacts, chromatic aberration, etc. Even if it is synthetic film grain that is added in post, that has been somebody’s artful decision; removing it and simulating noise on the client butchers the work.
codedokode 2 days ago [-]
There might be also copyright owners requirements, e.g. contract that limits the quality of material.
kwanbix 2 days ago [-]
I recall Netflix saying that streaming cost was nothing compared to all other costs.
strogonoff 2 days ago [-]
I am curious to see their breakdown. It seems very counter-intuitive of them to invest so much into reducing bitrate if the cost of delivery is negligible. R&D and codec design efforts cost money and running more optimized codecs and aggressive denoising cost compute.
matt-attack 1 days ago [-]
Improved compression also saved on storage costs. We would have to hear them state the same about storage.
EasyMark 1 days ago [-]
It probably is but the bean counters do not want to hear this, they want to cut everything to the point that it's just above the limit that consumers will accept before they throw in the towel and cancel their membership (ads, low quality compression, etc)
charcircuit 2 days ago [-]
>Content delivery costs a lot for streaming services.
The hard disk space to store an episode of a show is $0.01. With peering agreements, the bandwidth of sending the show to a user is free.
pixl97 2 days ago [-]
>With peering agreements bandwidth of sending the show to a user is free.
I'm not sure why you think this, but it's one of the oddest things I've seen today.
The more streams you can send from a single server the lower your costs are.
charcircuit 2 days ago [-]
Sure, but buying a server is not buying bandwidth. The point of my post is to counter the narrative that streaming video is very expensive.
cogman10 2 days ago [-]
> You especially notice the compression on gradients and in dark movie scenes.
That's not a correctly calibrated TV. The contrast is tuned WAY up. People do that to see what's going on in the dark, but you aren't meant to really be able to see those colors. That's why it's a big dark blob. It's supposed to be barely visible on a well calibrated display.
A lot of video codecs will erase details in dark scenes because those details aren't supposed to be visible. Now, I will say that streaming services are tuning that too aggressively. But I'll also say that a lot of people have miscalibrated displays. People simply like to be able to make out every detail in the dark. Those two things come in conflict with one another causing the effect you see above.
amiga386 2 days ago [-]
> but you aren't meant to really be able to see those colors
Someone needs to tell filmmakers. They shoot dark scenes because they can - https://www.youtube.com/watch?v=Qehsk_-Bjq4 - and it ends up looking like shit after compression that assumes normal lighting levels.
toofy 2 days ago [-]
> Someone needs to tell filmmakers. They shoot dark scenes because they can…
i disagree completely. i watch a movie for the filmmakers story, i don’t watch movies to marvel at compression algorithms.
it would be ridiculous to watch movies shot with only bright scenes because streaming service accountants won’t stop abusing compression to save some pennies.
> …ends up looking like shit after compression that assumes normal lighting levels.
it’s entirely normal to have dark scenes in movies. streaming services are failing if they’re using compression algorithms untuned to do dark scenes when soooo many movies and series are absolutely full of night shots.
simongr3dal 2 days ago [-]
I feel like there are probably cinematic film-making tricks you can use to imply a very dark scene without serving #111 pixels all over the screen.
cogman10 2 days ago [-]
As I said, I think the streamer services have too aggressive settings there. But that doesn't change the fact that the a lot of people have their contrast settings over tuned.
It should be noted, as well, that this generally isn't a "not enough bits" problem. There are literally codec settings to tune which decide when to start smearing the darkness. On a few codecs (such as VP1) those values are pretty badly set by default. I suspect streaming services aren't far off from those defaults. The codec settings are instead prioritizing putting bits into the lit parts of a scene rather than sparing a few for the darkness like you might like.
astrange 2 days ago [-]
Video codecs aren't tuned for any particular TV calibration. They probably should be because it is easier to spot single bit differences in dark scenes, because the relative error is so high.
The issue is just that we don't code video with nearly enough bits. It's actually less than 8-bit since it only uses 16-235.
codedokode 2 days ago [-]
If an eye is able to distuinguish all 256 shades on a correctly calibrated display, then the content should be preserved.
ksec 2 days ago [-]
>Will streaming services ever stop over-compressing their content?
Before COVID Netflix were at least using 8Mbps for 1080P content. With x264 / beamr it is pretty good, and even better on HEVC. Then COVID hit, every streaming service not just Netflix have excuses to lower their quality due to increased demand with limited bandwidth. Everything went down hill since then. Customer got used to lower quality I dont believe they ever bring it back up. Now it is only something like 3-5Mbps according to previous test posted on HN.
And while it is easy for HEVC / AV1 / AV2 to have 50%+ bitrate real world savings compared to H.264 saving at 0.5 - 4Mbps range, once you go pass that the savings begin to shrink rapidly to the point good old x264 encoder may perform better at much higher bitrate.
goeiedaggoeie 2 days ago [-]
Netflix also has a huge incentive to not use h265 and h264, licensing cost.
ls612 2 days ago [-]
Most 1080p WEB-DLs are in the 6-8 Mbps range still, based on a quick glance.
ksec 2 days ago [-]
Nice. There was a previous post on HN with showing the 5 to 6 samples he had were from 3 - 5Mbps.
ls612 2 days ago [-]
Some lower bitrate shows are animated cartoons where far less bitrate is really needed. I’m sure you could find some awful compression botches if you really looked on a public tracker though.
Boss0565 2 days ago [-]
Seems like piracy is the way for you
nani8ot 2 days ago [-]
Sadly many shows aren't released on BluRay anymore, so even piracy won't deliver better quality.
rokweom 2 days ago [-]
Piracy enables you to do things like debanding on playback, or more advanced video filtering to remove other compression issues.
ValentineC 2 days ago [-]
I believe many sites still prefer Amazon webrips because their content is encoded at a higher bitrate than Netflix.
goeiedaggoeie 2 days ago [-]
darn pirates should run the content through a super resolution model!
GeekyBear 2 days ago [-]
Not all video streaming services choose to use the same extremely low average video bit rate used by Netflix on some of their 4k shows.
Netflix has shown they're the mattress-company equivalent of streaming services.
You will be made to feel the springs on the cheapest plan/mattress, and it's on purpose so you'll pay them more for something that costs them almost nothing.
TheJoeMan 2 days ago [-]
I’m still so surprised Disney+ degrades their content/streaming service so much. Of all the main services I’ve tried (Netflix, Prime, Hulu, HBO) Disney+ has some of the worst over-compression, lip-sync, and remembering-which-episode-is-next issues for me. Takes away from the “magic”.
thevagrant 2 days ago [-]
Check your settings. I experienced the same until I altered Apple TV settings that fixed Disney+. If I recall, the setting was Match content or Match dynamic range (not near tv right now to confirm exact name)
swiftcoder 2 days ago [-]
Netflix now this on their lowest paid tier as well. I had to upgrade to the 4K tier just to get somewhat-ok 1080p playback...
ksec 2 days ago [-]
This is interesting because Disney+ when they started out were using much higher bitrate, 2nd only to Apple+.
pkulak 2 days ago [-]
Are you sure about the black-areas-blocking? I remember a long time ago, when I was younger and had time for this kind of tomfoolery, I noticed this exact issue in my BlueRay backups. I figured I needed to up the bitrate, so I started testing, upping the bitrate over and over. Finally, I played the BlueRay and it was still there. This was an old-school, dual-layer, 100GB disc of one of the Harry Potter movies. Still saw the blocking in very dark gradients.
adgjlsfhk1 2 days ago [-]
the downside of 8 bits per channel is you really don't have enough to get a smooth gradient over dark colors.
pkulak 2 days ago [-]
Yup. I think that’s exactly it.
yalogin 2 days ago [-]
I don't quite follow why compression would cause this. Feels more like a side effect of adaptive HTTPS streaming protocol where it would automatically adjust based on your connection speed, and so aligns with any jitter on the wire. It could also be an issue with the software implementation because they need to constantly switch between streams based on bandwidth.
pkulak 2 days ago [-]
> side effect of adaptive HTTPS streaming
Adaptive streaming isn't really adaptive anymore. If you have any kind of modern broadband, the most adaptive it will be is starting off in one of the lower bitrates for the first 6 seconds before jumping to the top, where it will stay for the duration of the stream. A lot of clients don't even bother with that anymore; they look at the manifest, find the highest stream, and just start there.
axiolite 2 days ago [-]
> You especially notice the compression on gradients and in dark movie scenes.
That can happen at even the highest bitrates if "HDR" is not enabled in the video codec.
Whoa. That is the best thing i watched on YouTube in a long time. Thank you.
nerdsniper 2 days ago [-]
I pirate blu-ray rips. Pirates are very fastidious about maintaining visual quality in their encodings. I often see them arguing over artifacts that I absolutely cannot see with my eyes.
Funny that they're marketing the supposed advantages of higher bitrates using pictures with altered contrast and saturation lol. I would expect the target audience to be somewhat affluent in the actual benefits? Then again, I wouldn't expect somebody like Scorsese to be a video compression nerd.
Also the whole "you can hear more with lossless audio" is just straight up a lie.
dogma1138 2 days ago [-]
This has been more or less proven to be a complete scam, the quality isn’t any better than Blu-ray and in many cases worse.
"Not any better than Blu-ray" is the same as saying "much better than streaming."
kodt 2 days ago [-]
I think there are a few examples where the bitrate is higher than a native rip however.
muststopmyths 2 days ago [-]
Fascinating.
Pricing, if I am reading the site correctly: $7k-ish for a server (+$ for local disks, one assumes), $2-5k per client. So you download the movie locally to your server and play it on clients scattered throughout your mansion/property.
Not out of the world for people who drop 10s of thousands on home theater.
I wonder if that's what the Elysium types use in their NZ bunkers.
No true self-respecting, self-described techie (Scotsman) would use it instead of building their own of course.
tedivm 2 days ago [-]
For the less affluent you can setup a Jellyfin media server and rip your own blu-rays with makemkv.
tliltocatl 2 days ago [-]
Economically speaking, it doesn't make any sense for them to spend more on bandwidth and storage if they can get away with not spending more.
2 days ago [-]
RamRodification 2 days ago [-]
It's a little surprising to me that there generally aren't more subscription tiers where you can pay more for higher quality. Seems like free money, from people like you (maybe) and me.
Avamander 2 days ago [-]
You can already pay for 4K or "enhanced bitrate" but it's still relatively low bitrate and what's worse, this service quality is not guaranteed. I've had Apple TV+ downgrade to 1080p and lower on a wired gigabit connection so many times.
Neywiny 2 days ago [-]
And on top of that a lot of streaming services don't go above 1080p on desktop, and even getting them to that point is a mess of DRM. I sometimes wonder if this is the YouTube powerhouse casting a bad shadow. As LTT says, don't try to compete with YouTube. They serve so much video bandwidth it's impossible to attempt. So all these kinda startup streaming services can't do 4k. Too much bandwidth.
mdasen 2 days ago [-]
I'm not surprised they don't offer an even higher tier. When you're pricing things, you often need to use proxies - like 1080p and 4K. It'd be hard to offer 3 pricing tiers: 1080p, 4K, 4K but actually good 4K that we don't compress to hell. That third tier makes it seem like you're being a bit fraudulent with the second tier. You're essentially admitting that you've created a fake-4K tier to take people's money without delivering them the product they think they're buying. At some point, a class-action lawsuit would use that as a sort of admission that you knew you weren't giving customers what they were paying for and that it was being done intentionally, both of which matter a lot.
Right now, Netflix can say stuff like "we think the 4K video we're serving is just as good." If they offer a real-4K tier, it's hard to make that argument.
redox99 2 days ago [-]
YouTube does 1080p premium without much problem.
andrepd 2 days ago [-]
Ironically, piracy gives you yet again a better service. Thanks QxR.
stavros 2 days ago [-]
Well, you'll be happy to learn that AV2 delivers 30% better quality for the same bitrate!
tuna74 2 days ago [-]
Isn't Sony Bravia Core supposed to be UHD Blu-ray quality?
cypherg 2 days ago [-]
and this is why I don't look down on those who choose to pirate bluray/4k content
pwarner 2 days ago [-]
4K Blu-ray is the top quality.
fuzzfactor 2 days ago [-]
>the best picture quality I’ve ever seen was over 20 years ago using simple digital rabbit ears.
The biggest jump in quality was when everything was still analog over the air, but getting ready for the digital transition.
Then digital over the air bumped it up a notch.
You could really see this happen on a big CRT monitor with the "All-in-Wonder" television receiver PCI graphics adapter card.
You plugged in your outdoor antenna or indoor rabbit ears to the back of the PC, then tuned in the channels using software.
These were made by ATI before being acquired by AMD, the TV tuner was in a faraday cage right on the same PCB as the early GPU.
The raw analog signal was upscaled to your adapter's resolution setting before going to the CRT so you had pseudo better resolution than a good TV like a Trinitron. You really could see more details and the CRT was smooth as butter.
As the TV broadcaster's entire equipment chain was replaced, like camera lenses, digital sensors and signal processing they eventually had everything in place and working. You could notice these incremental upgrades until a complete digital chain was established as designed. It was really jaw-dropping. This was well in advance of the deadline for digital deployment, so the signal over-the-air was still coming in analog the same old way.
Eventually the broadcast signal switched to digital and the analog lights went out, plus the All-in-Wonder was not ideal with a cheap converter like analog TV's could get by with.
But it was still better than most digital TVs for a few years, then it took years more before you could see the ball in live sports as well as on a CRT anyway.
Now that's about all you've got for full digital resolution, live broadcasts from your local stations, especially live sports from a strong interference-free station over an antenna. You can switch between the antenna and cable and tell the difference when they're both not overly compressed.
The only thing was, digital engineers "forgot" that TV was based on radio (who knew?) so for the vast majority of "listeners" on the fringe reception areas who could get clear audio but usually not a clear picture if any, too bad for you. You're gonna need a bigger antenna, good enough to have gotten you a clear picture during the analog days. Otherwise your "clean" digital audio may silently appear on the screen as video, "hidden" within the sparse blocks of scattered random digital noise. When anything does appear at all.
michaelsshaw 2 days ago [-]
As a little experiment, I'd like you to set up your own little streaming service on a server and see how much bandwidth it uses, even for just a few users. It adds up extremely quickly, with the actual using being quite surprising.
At the higher prices, I'd have to agree with you. If you pay for the best you should get the best.
andsoitis 2 days ago [-]
> Honestly, the best picture quality I’ve ever seen was over 20 years ago using simple digital rabbit ears.
That I find super hard to believe!
axiolite 2 days ago [-]
Why? ATSC is 19Mbps. A single 1080i video using that whole bitrate will look quite good.
andsoitis 2 days ago [-]
Many confounding factors. That’s just one dimension of image quality. Others include things like the panel quality, production quality.
adgjlsfhk1 2 days ago [-]
that's 19mbps including error correction. only 10 after, and that's using mpeg2 which is probably roughly equivalent to 6-7mbps av1
axiolite 2 days ago [-]
"A terrestrial (over-the-air) transmission carries 19.39 megabits of data per second (a fluctuating bandwidth of about 18.3 Mbit/s left after overhead such as error correction, program guide, closed captioning, etc.),"
It could be a single channel, but usually you have many in the multiplex. I don't know how it works in the US, but for DVB-T(2) that's how it is.
axiolite 2 days ago [-]
Circa 2009 when analog TV was first shut-off in the US, each DTV station usually only had one channel, or perhaps a second basic one like a static weather radar on-screen. Some did have 3 or 4 sub-channels early on, but it was uncommon.
Circa 2019, after the FCC "repack" / "incentive auction" (to free-up TV channels for cellular LTE use) it became very common for each RF channel to carry 4+ channels. But to be fair, many broadcasters did purchase new, improved MPEG-2 encoders at that time, which do perform better with a lower bit-rate, so quality didn't degrade by a lot.
thunderfork 2 days ago [-]
[dead]
mouthwooed 2 days ago [-]
[dead]
jonplackett 2 days ago [-]
It’s pretty amazing people are still finding ways to make video smaller.
Is this just people being clever or is it also more processing power being thrown at the problem when decoding / encoding?
amiga386 2 days ago [-]
Yes, and it's allowing the format to change to allow more cleverness or apply more processing power.
For example, changes from one frame to the next are encoded in rectangular areas called "superblocks" (similar to a https://en.wikipedia.org/wiki/Macroblock). You can "move" the blocks (warp them), define their change in terms of other parts of the same frame (intra-frame prediction) or by referencing previous frames (inter-frame prediction), and so on... but you have to do it within a block, as that's the basic element of the encoding.
The more tightly you can define blocks around the areas that are actually changing from frame to frame, the better. Also, it takes data to describe where these blocks are, so there are special limitations on how blocks are defined, to minimise how many bits are needed to describe them.
AV2 now lets you define blocks differently, which makes it easier to fit them around the areas of the frame that are changing. It has also doubled the size of the largest block, so if you have some really big movement on screen, it takes fewer blocks to encode that.
That's just one change, the headline improvement comes from all the different changes, but this is an important one.
There is new cleverness in the encoders, but they need to be given the tools to express that cleverness -- new agreement about what types of transforms, predictions, etc. are allowed and can be encoded in the bitstream.
In general with movement through scenes it would seem that rectangular update windows seem like a poor match.
Is there a reason codec's don't use the previous frame(s) as stored textures, and remap them on the screen? I can move a camera through room and a lot of the texture is just reprojectivetransformed.
amiga386 2 days ago [-]
> I can move a camera through room
That's what AV1 calls global motion and warped motion. Motion deltas (translation/rotation/scaling) can be applied to the whole frame, and blocks can be sheared vertically/horizontally as well as moved.
DoctorOetker 1 days ago [-]
I wasn't very clear when I reread. What I meant was remapping textured triangles (or remapping bezier surfaces).
Consider a scene with a couple of cars moving on a background, one can imagine a number of vertices around the contour of each car, and reusing the previous car, it makes no sense to force the shape of blocks. The smaller the seams between shapes (reusing previous frames as textures), the fewer pixels it needs to reconstitute de novo. The more accurate the remapping xy_old(x_prev,y_prev)-><x,y>, the lower the error signal that needs to be reconstructed.
Also the majority of new contour vertex locations can be reused as the old contour locations in the next frame decoding. Then only changes in contour vertexes over time need to be encoded, like when a new shape enters the scene, or a previously static object starts moving. So there is a lot of room for compression.
dddgghhbbfblk 2 days ago [-]
>Is there a reason codec's don't use the previous frame(s) as stored textures, and remap them on the screen? I can move a camera through room and a lot of the texture is just reprojectivetransformed.
I mean, that's more or less how it works already. But you still need a unit of granularity for the remapping. So the frame will store eg this block moves by this shift, this block by that shift etc.
DoctorOetker 1 days ago [-]
> But you still need a unit of granularity for the remapping. So the frame will store eg this block moves by this shift, this block by that shift etc.
This is exactly what I question. Why should there be block shaped units of granularity? defining a UV-textured 3D mesh that moves and carries previous decoded pixel values should have much less seams, with a textured mesh instead of blocks the only de novo pixel values would be the seams between reusable parts of the mesh, for example when an object rotates and reveals a newly visible part of its surface.
dajonker 2 days ago [-]
I believe patents play a big role here as well. Anything new must be careful to not (accidentally) violate any active patent, so there might be some tricks that can't currently be used for AV1/AV2
cogman10 2 days ago [-]
I think patents are quickly becoming less of a problem. A lot of the foundational encoding techniques have exited patent protection. H.264 and everything before it is patent free now.
It's true you could still accidentally violate a patent but that minefield is clearing out as those patents simply have to become more esoteric in nature.
PunchyHamster 2 days ago [-]
...till someone decides to patent one of the new techniques used
cogman10 2 days ago [-]
You can't patent something that's in use. Prior art is a defense to a patent claim/lawsuit.
But that's not my main point. My main point is that we are going down a fitting path with codecs which makes it hard to come up with general patents that someone might stumble over. That makes patents developed by the MPEG group far less likely to apply to AOM. A lot of those more generally applicable patents, like the DCT for example, have expired.
mikeyouse 2 days ago [-]
There are numerous patent trolls in this space with active litigation against many of the participants in the consortium who brought AV1. The EU was also threatening to investigate (likely to protect the royalty revenues of European companies)
bee_rider 2 days ago [-]
It has always seemed very weird to me that compression algorithms were patentable.
1) it harms interoperability
2) I thought math wasn’t patentable?
zweifuss 2 days ago [-]
A bit of both. Also, the modern Codecs have slightly different tradeoffs (image quality (PSNR, SSIM), computational complexity (CPU vs DSP vs Memory), storage requirements, bit rate) and therefore there isn't one that is best for every use case.
cornholio 2 days ago [-]
I wonder when we will see generative AI codecs in production. The concept seems simple enough, the encoder knows the exact model the decoder will use to generate the final image starting from a handful of pixels, and optimizes towards lowest bitrate and minimum subjective quality loss, for example, by letting the decoder generate a random human face in the crowd, or give it more data in that area to steer it towards the face of the team maskot, as the case may be.
At the absolute compression limit, it's no longer video, but a machine description of the scene conceptually equivalent to a textual script.
ozgrakkurt 2 days ago [-]
There was nvidia videoo upsampling or w/e it is called. It was putting age spots on every face when it was blurry and it used too much resources as far as I can remember
justsid 1 days ago [-]
And then that script gets processed on hundreds of GPUs in the cloud and the video gets streamed to the client. Wait.
tverbeure 2 days ago [-]
I don’t know the details of AV2, but going from h.265 to h.266, the number of angles for angular prediction doubled, they added a tool to predict chroma from luma, added the ability to do pixel block copies and a bunch of other techniques… And that’s just for intra predictions. They also added tons of new inter prediction techniques.
All of this requires a significant amount of extra logic gates/silicon area for hardware decoders, but the bit rate reduction is worth it.
For CPU decoders, the additional computational load is not so bad.
The real additional cost is for encoding because there’s more prediction tools to choose from for optimal compression. That’s why Google only does AV1 encoding for videos that are very popular: it doesn’t make sense to do it on videos that are seen by few.
Gigachad 2 days ago [-]
Iirc Facebook did the selective encoding too. And it would predict which videos would be popular so even the first streams would get the AV1 version.
toast0 2 days ago [-]
New video codecs typically offer more options for how to represent the current frame in terms of other frames. That typically means more processing for the encoder, because it can check for all the similarities to see what works best; there's also harder math for arithmetic coding of the picture data. It will be more work for the encoder if it needs to keep more reference images and especially if it needs to do harder transformations, or if arithemetic decoding gets harder.
Clever matters a lot more for encoding. If you can determine good ways to figure out the motion information without trying them all, that gets you faster encoding speed. Decoding doesn't tend to have as much room for cleverness; the stream says to calculate the output from specific data, so you need to do that.
olowe 2 days ago [-]
It’s more money and more user’s compute being thrown at the problem to get the streaming service’s CDN bill down.
occz 2 days ago [-]
While funny, that's not really what I would call accurate. Users get reduced data consumption, potentially higher quality selection if the bandwidth now allows for a higher resolution to be streamed, and possibly lower disk usage should they decide to offline the videos.
Better codecs are an overall win for everyone involved.
dsnr 2 days ago [-]
> Better codecs are an overall win for everyone involved.
I don’t remember ever watching a movie and wishing for a better codec, in the last 10 years
ubercow13 2 days ago [-]
I do because the quality of av1 on youtube is often significantly better than vp9 and especially h264, even though the filesize is usually lower than both. And the quality of the video at 1080p when only the worse formats are available is noticeably bad.
toast0 2 days ago [-]
I can send you some of my DVDs that look like trash now. Of course, that's less of a codec problem and more of a bandwidth/encoder/mastering problem; plenty of DVDs look fine (if a little undetailed) on a larger screen.
I do wish ATSC1 would adopt a newer codec (and maybe they will), most of the broadcasters cram too many subchannels in their 20mbps and a better codec would help for a while. ATSC3 has a better video codec and more efficient physical encoding, but it also DRM and a new proprietary audio codec, so it's not helpful for me.
izacus 2 days ago [-]
How is that relevant to anything?
calcifer 2 days ago [-]
> Users get reduced data consumption, potentially higher quality selection if the bandwidth now allows for a higher resolution to be streamed
They also get increased power usage, lesser battery life, higher energy bills, and potentially earlier device failures.
> Better codecs are an overall win for everyone involved.
Right.
whatevaa 2 days ago [-]
If you experience early failure your device is badly engineered.
Mobile/power constrained devices don't use software decoding, that just a path to miserable experience. Hardware decoding is basically required.
Meanwhile my desktop can SW decode 4k youtube with 3% reported cpu usage.
testdelacc1 2 days ago [-]
> power usage, lesser battery life, higher energy bills
I like how you padded this list by repeating the same thing thrice. Like, increased power usage is obviously going to lead to higher energy bills.
And it’s especially weird because it’s not true? The current SOTA codec AV1 is at a sweet spot for both compression and energy demand (https://arxiv.org/html/2402.09001v1). Consumers are not worse off!
badpun 2 days ago [-]
Not to mention making your device obsolete. My 12 year old laptop already can't decode some of the videos on Pirate Bay in real time, because the codec is too demanding.
bee_rider 2 days ago [-]
Of course, we’re living in the future where Moore’s law has seriously slowed down. But, as a product of the 90’s this is a kind wild thing to see. I can’t imagine in the year 2000 being disappointed content wouldn’t play on a 386 or something.
But, I mean, your expectation is not that unreasonable, computers were quite good by 2013. It is just an eye-opening framing.
condiment 2 days ago [-]
Modern video codecs are what broke the telco monopoly on content and gave us streaming services in the first place. If the cdn bill is make or break, the service isn’t going to last.
And there’s no transfer of effort to the user. Compute complexity of video codecs is asymmetric. The decode is several order of magnitude cheaper to compute than the encode. And in every case, the principal barrier to codec adoption has been hardware acceleration. Pretty much every device on earth has a hardware-accelerated h264 decoder.
joshstrange 2 days ago [-]
For those of us who back up media, this can be very appealing as well. I don’t disagree that what you said is a major driving force, but better formats have benefited me and my storage requirements multiple times in the past.
whycome 2 days ago [-]
Soon we will just have local AI processors which will just make stuff up between scenes but adhere to a “close enough” guideline where all narratively critical elements are maintained but other things (eg landscapes or trees) will be generated locally. Movies will practically be long cutscenes with photorealistic graphics.
0xCMP 2 days ago [-]
I'm sure models which replace characters in realtime will also become popular. I would imagine some company thinking it would be cool if the main character looked slightly more like whatever main audience it's being shown to and it's done on their playback devices (so, of course, it can be customized or turned off).
I find the idea fun, kinda like using snapchat filters on characters, but in practice I'm sure it'll be used to cut corners and prevent the actual creative vision from being shown which saddens me.
bee_rider 2 days ago [-]
At that point we aren’t even all watching the same movies. Which could be interesting. But very different—I mean, even stuff like talking with your friends about a movie you saw will change drastically. Maybe a service could be centered around sharing your movie prompts so have a shared movie experience to talk to your friends about.
CooCooCaCha 2 days ago [-]
Entertainment is becoming increasingly customizable and personalized. It’ll get to the point, like you said, that we’re not watching the same movie, playing the same game, etc.
It feels like we’re losing something, a shared experience, in favor of an increasingly narcissistic attitude that everything needs to shapeable to individual preferences instead of accepting things as they are.
bee_rider 2 days ago [-]
I dunno. Entertainment is sort of inherently selfish, right? It is an unproductive thing we engage in to make us happy.
I’d be somewhat interested in something like a git that generates movies, that my friends can push to.
Extremely widespread mass media fiction broadcast are sort of an aberration of the last 75 years or so. I mean, you’d have works in ancient times—the Odyssey—that are shared across a culture. But, that was still a story customize by each teller, and those sorts of stories were rare. Canon was mainly a concern of religions.
It’s just for fun, we give it far too much weight nowadays.
ksec 2 days ago [-]
Let's hope they get more things right 2nd time around. AOM will do Live Session on 20th of October: The Future of Innovation is Open [1].
May be more data and numbers. Including Encoding Complexity increase, decoding complexity. Hardware Decoder roadmap. Compliance and Test kits. Future Profile. Involvement and improvement to both AVIF the format and the AV2 image codec. Better than JPEG-XL? Are the ~30% BDRATE compared to current best AV1 encoder or AV1 1.0 as anchor point? Live Encoding improvements?
30% over AV1 is crazy, it doesn't feel too long since AV1 released but that was in 2019.
lxgr 2 days ago [-]
Yet I still only got hardware support for it on my first devices last year. The downside of "rapid" iteration on video codecs is that content needs to always be stored in multiple formats (or alternatively battery life on the client suffers from software playback, which is the route e.g. Youtube seems to be preferring).
amiga386 2 days ago [-]
Hopefully that improves. The guy giving the presentation on AV2 made clear there was "rigorous scrutiny for hardware decoding complexity", and they were advised by Realtek and AMD on this.
So it seems like they checked that all their ideas could be implemented efficiently in hardware as they went along, with advice from real hardware producers.
Hopefully AV2-capable hardware will appear much quicker than AV1-capable hardware did.
lxgr 2 days ago [-]
Oh, I don't doubt that it'll be hardware implementable, but it's a shame that current hardware is usually mostly out of luck with new codecs. (Sometimes parts can be reused in more programmable/compartmentalized decoding pipelines, but I haven't seen that often.)
quaintdev 2 days ago [-]
That's reason I prefer firesticks over TVs
pdimitar 2 days ago [-]
I have to wonder whether PCIe devices that do hardware encoding / decoding might be the more viable path going forward?
Wait, I just discovered GPUs, nevermind. [giggles]
Still, the ability to do specialized work should probably be offloaded to specialized but pluggable hardware. I wonder what the economics of this would be...
RicoElectrico 2 days ago [-]
Maybe not only reference software, but also reference RTL should be provided? Yes, this is more work, but should speed up adoption immensely.
IshKebab 2 days ago [-]
There's no point having reference RTL. The point of reference software is to demonstrate the correct behaviour for people implementing production grade libraries and RTL. Having an RTL version of that wouldn't add anything - it should have identical behaviour.
Providing a production grade verified RTL implementation would obviously be useful but also entire companies exist to do that and they charge a lot of money for it.
Neywiny 2 days ago [-]
Could help people on the hobby or lower budget FPGA side. H.264/5/etc never really made it
tverbeure 2 days ago [-]
There is absolutely no way an FPGA would make sense. The requirements for AV1 and H265 far exceed the hardware resources of lower budget FPGAs.
For the same process, FPGA logic density is about 40x lower than ASIC, and lower budget FPGAs use older processes.
A h265 or AV1 decoder requires millions of logic gates (and DRAM memory bandwidth.) Only high-end FPGAs provide that.
Neywiny 2 days ago [-]
There's mention that the decode could get a lot easier. Here's an H264 core that runs on older lattice chips and only takes 56k luts. https://www.latticesemi.com/products/designsoftwareandip/int... . Microchip's polarfires have a soft H.264 core as well taking under 20k. If AV2 will really be easier for hardware to implement, it might work out. Here's another example, H 264 decode in an artix 7, can do 1080p60 https://www.cast-inc.com/compression/avc-hevc-video-compress... . So with all due respect, what in the world are you talking about?
tverbeure 1 days ago [-]
I didn't mention h264 for a reason. It's a codec that was developed 25 years ago.
The complexity of video decoders has been going up exponentially and AV2 is no exception. Throwing more tools (and thus resources) at it is the only way to increase compression ratio.
Take AV1. It has CTBs that are 128x128 pixels. For intra prediction, you need to keep track of 256 neighboring pixels above the current CTB and 128 to the left. And you need to do this for YUV. For 420, that means you need to keep track of (256+128 + 2x(128+64)) = 768 pixels. At 8 bits per component, that's 8x768=6144 flip-flops. That's just for neighboring pixel tracking, which is only a tiny fraction of what you need to do, a few % of the total resources.
These neighbor tracking flip-flops are followed by a gigantic multiplexer, which is incredibly inefficient on FPGAs and it devours LUTs and routing resources.
A Lattice ECP5-85 has 85K LUTs. The FFs alone consume 8% of the FPGA. The multiplier probably another conservative 20%. You haven't even started to calculate anything and your FPGA is already almost 30% full.
FWIW, for h264, the equivalent of that 128x128 pixel CTB is 16x16 pixel MB. Instead of 768 neighboring pixels, you only need 16+32+2*(8+16)=96 pixels. See the difference? AV2 retains the 128x128 CTB size of AV1 and if it adds something like MRL of h.266, the number of neighbors will more than double.
H264 is child's play compared later codecs. It only has a handful of angular prediction modes, it has barely any pre-angular filtering, it has no chroma from luma prediction, it only has a weak deblocking filter and no loop filtering. It only has one DCT mode. The coding tree is trivial too. Its entropy decoder and syntax processing is low in complexity compared to later codecs. It doesn't have intra-block copy. Etc. etc.
Working on a hardware video decoder is my day job. I know exactly what I'm talking about, and, with all due respect, you clearly do not.
Neywiny 1 days ago [-]
Hmmm so you're ignoring the crux of my argument because it's convenient for you (h264 is comfortably small, AV1 is maybe too big, so between them might work). So anything that's related to why AV1 won't fit is pointless. They know that and are improving on it.
Your argument about your large amount of flops is odd. You would only store data that way if you needed everything on the same cycle. You say there's a multiplexor after that. Data storage + multiplexor is just memory. Could use a bram or lutram which would cut down on that dramatically, big if there's a need based on later processing which you haven't defined. And even then, that's for AV1 which isn't AV2 and may change
tverbeure 1 days ago [-]
I’m ignoring h264 because it’s irrelevant in a discussion about AV2, for the reasons that I already brought up in my earlier reply. It’s like having a discussion about a Zen CPU and bringing up the 8088 architecture.
Let’s cut to the chase. AV2 will not be smaller than AV1 at all. The linked article doesn’t say that. The slides don’t say that either.
The only thing that could make somebody think that it’s smaller is the claim that all tools have been validated for hardware efficiency. The goal of this process is to make sure that none of the new tools make the HW unreasonably explode in size, not to make the codec smaller than before, because everyone knows that this is impossible if you want to increase compression ratio.
Let’s look at 2 of those new tools. MRLS: this adds multiple reference lines, just like I expected there would be. Boom! Much more complexity for neighbor handling. I also see more directions (more angles.) That also adds HW. The article mentions improved chroma from luma. Not unexpected because h266 already has that, and AV2 needs to compete against that. AV1 has a basic 2x2 block filter. I expect AV2 to have a more complex FIR filter, which makes things significantly harder for a HW implementation.
You are delusional if you think AV2 will be smaller than AV1.
The reason I brought up neighbor handling is because it’s so easy to estimate its resource requirements from first principles, not because it’s a huge part of a decoder. But if neighbors alone already make a smaller FPGA nearly impossible, it should be obvious that the whole decoder is ridiculous.
So… as for storing neighbors in RAM: if I’d bring this up at work, they’d probably send me home to take mental health break or something.
Neighbor processing lives right inside the critical latency loop. Every clock cycle that you add in that loop impacts performance. You need to update these neighbors after predicting every coding unit. Oh, and the article mentions that the CTB size (“super block” in AV2 parlance) has been increased from 128x128 to 256x256. Good luck area reducing that. :-)
IshKebab 1 days ago [-]
I think when they talk about AV2 being more hardware friendly they mean compared to AV1 not H.264.
Neywiny 1 days ago [-]
Yeah so if H264 fits comfortably, AV1 is maybe too big, then being better than AV1 could mean it's possible
sagarm 2 days ago [-]
Can a "lower budget" FPGA really outperform a consumer-grade CPU for this?
And what hobbyist is sending off decoding chips to be fabbed? If this exists, it sounds interesting if incredibly impractical.
tverbeure 2 days ago [-]
It’s not possible on any but the largest $$$ FPGAs… and even then we often need to partition over multiple FPGAs to make it fit. And it will only run at a fraction of the target clock speed.
mckirk 2 days ago [-]
It'd be really cool if we had 'upgradable codec FPGAs' in our machines that you could just use flash to the newest codec... but that'd probably be noticeably more expensive, and also not really in the interest of the manufacturers, who want to have reasons to sell new chips.
daeken 2 days ago [-]
Back in ~2004, I worked on a project to define a codec virtual machine, with the goal of each file being able to reference the standard it was encoded against, along with a link to a reference decoder built for that VM. My thought was that you could compile that codec for the system you were running on and decode in software, or if a sufficient DSP or FPGA was available, target that.
While it worked, I don't think it ever left my machine. Never moved past software decoding -- I was a broke teen with no access to non-standard hardware. But the idea has stuck with me and feels more relevant than ever, with the proliferation of codecs we're seeing now.
It has the Sufficiently Smart Compiler problem baked in, but I tried to define things to be SIMD-native from the start (which could be split however it needed to be for the hardware) and I suspect it could work. Somehow.
axiolite 2 days ago [-]
> FPGAs' in our machines that you could just use flash to the newest codec
They're called GPUs... They're ASICs rather than FPGAs, but it's easy to update the driver software to handle new video codecs. The difficulty is motivating GPU manufacturers to do so... They'd rather sell you a new one with newer codec support as a feature.
lxgr 8 hours ago [-]
They’re programmable parallel compute units, and as such pretty much the conceptual opposite of ASICs.
The main point of having ASICs for video codecs these days is efficiency, not being able to real-time decode a stream at all (as even many embedded CPUs can do that at this point).
toast0 2 days ago [-]
A lot of the GPUs have fixed function hardware to accelerate parts of encode/decode. If the new codec is compatible, sure.
But often a new codec requires decoders to know how to work with new things that the fixed function hardware likely can't do.
Encoding might actually be different. If your encoder hardware can only do fixed block sizes, and can only detect some types of motion, a driver change might be able to package it up as the new codec. Probably not a lot of benefit, other than ticking a box... but might be useful sometimes. Especially if you say offload motion detection, but the new codec needs different arithmetic encoding, you'd need to use cpu (or general purpose gpu) to do the arithmetic encoding and presumably get a size saving over the old codec.
astrange 2 days ago [-]
GPU video decoding doesn't use the graphics cores. It just happens to be on the GPU and is not upgradeable.
ZeroGravitas 2 days ago [-]
The main delay the last time was corporations being dicks about IP but the two main culprits have got on board this time.
bapak 2 days ago [-]
Unless they create a codec that GPUs are naturally good at, we will inevitably always be a couple of hardware cycles behind.
astrange 2 days ago [-]
You can't do this because GPUs are parallel and decompression cannot be parallel. If there's any parallelism it means it's not compressed as much as it could be.
illiac786 2 days ago [-]
It’s insane to a point I am very skeptic.
If true, that would be amazing.
2 days ago [-]
AaronAPU 2 days ago [-]
Codec implementation and optimization was probably my favorite type of work. It would be fun to dive deep into AV2 in those areas but no time!
mrbluecoat 2 days ago [-]
Hooray, finally a codec name that doesn't look like AVI
HelloUsername 2 days ago [-]
> finally a codec name that doesn't look like AVI
Isn't AVI a container format and not a codec?
mafuy 2 days ago [-]
That's not important for the sentence, as written.
seydor 2 days ago [-]
All this high speed fiber for nothing...
SG- 2 days ago [-]
it's almost like a majority of the world is still consuming data and video on mobile networks.
lxgr 2 days ago [-]
Exactly. I'm glad I'm still in the return period for my 1TB microSD card!
AlienRobot 2 days ago [-]
More media -> More speed demand -> More speed -> More media -> More speed demand...
brnt 2 days ago [-]
Saturation is only a goal if it's a bottleneck. It's good to remove bottlenecks.
ttoinou 2 days ago [-]
well we might start streaming 8K content ! Or maybe this could be used for 16K VR videos
stop50 2 days ago [-]
Or the next <insert shooter with ridiculous size here>
throwaway48476 2 days ago [-]
It's a shame European energy efficiency rules made 8K non viable. It's a great resolution.
lxgr 2 days ago [-]
Meh, I'll take 1080p at a higher bitrate and in a wider color gamut over the pixel soup many VOD services serve and have the audacity to still call "UHD" any day.
illiac786 2 days ago [-]
Framerate and hdr are more important.
I don’t understand why 60fps never became ubiquitous, a pan scene in 30fps is horrible, its almost stroboscopic to me.
akie 2 days ago [-]
Yeah damn those Europeans for outlawing energy inefficient things
uyzstvqs 2 days ago [-]
Concurrent maximum efficiency + maximum availability is the way to go. This principle also applies to compute power and energy markets.
2OEH8eoCRo0 2 days ago [-]
Who does this benefit? Sounds like this stuff mainly benefits streaming providers and not users. We get to go through the whole rigamarole again where hardware is made obsolete because it doesn't support acceleration.
jug 2 days ago [-]
How does it not help users to lower your mobile bandwidth use? This is especially useful in the era of TikTok, Snapchat, and YouTube.
latexr 2 days ago [-]
I always thought the name AV1 was partly a play on/homage to AVI (Audio Video Interlace), but AV2 breaks that. Even if it’s meant to be embedded into other container formats such as MP4, there are files with the .av1 extension and there is a video/AV1 MIME type (and possibly a UTI?). Does this mean we now need to duplicate all that to .av2 and video/AV2? What about the AVIF file format?
uyzstvqs 2 days ago [-]
Files with the .av1 extension are for raw AV1 data. For AV2 this should become .av2, yes. That's by design, as they're two different incompatible formats. Typically you use a container like Matroska (.mkv, video/x-matroska), WebM or MP4 which contains your video stream with a type code specifying the codec (av01, av02).
AVIF is also a container format, and I believe should be adaptable to AV2, even if the name stands for "AV1 image format". It could simply just be renamed to AOMedia Video Image Format for correctness.
ttoinou 2 days ago [-]
Do you mean the file extension should only reflect the file format and not the codecs it has inside ?
Maybe that’s what we did in the past and it was a bad idea. It’d be useful to know if you can read the file by looking only at its extension
amiga386 2 days ago [-]
File extension shouldn't matter at all, because data should have associated metadata (e.g. HTTP content-type, CSS image-set, HTML <video><source type=""/></video>)
> It’d be useful to know if you can read the file by looking only at its extension
That would be madness, and there's already a workaround - the filename itself.
For most people, all that matters is an MKV file is a video file, and your configured player for this format is VLC. Only in a small number of cases does it matter about an "inner" format, or choice of parameter - e.g. for videos, what video codec or audio codec is in use, what the bitrate is, what the frame dimensions are.
For where it _matters_, people write "inner" file formats in the filename, e.g. "Gone With The Wind (1939) 1080p BluRay x265 HEVC FLAC GOONiES.mkv", to let prospective downloaders choose what to download from many competing encodings of exactly the same media, on websites where a filename is the _only_ place to write that metadata (if it were a website not standardised around making files available and searching only by filenames, it could just write it in the link description and filename wouldn't matter at all)
Most people don't care, for example, that their Word document is A4 landscape, so much that they need to know _in the filename_.
lxgr 2 days ago [-]
> Do you mean the file extension should only reflect the file format and not the codecs it has inside ?
That's pretty much always been the case. File extensions are just not expressive enough to capture all the nuances of audio and video codecs. MIME types are a bit better.
Audio is a bit of an exception with the popularity of MP3 (which is both a codec and a relatively minimal container format for it).
o11c 2 days ago [-]
If only people would actually bother to write the codecs in the MIME type in the first place ...
lxgr 2 days ago [-]
Easier said than done, when there's no commonly accepted standard to store a file's MIME type as metadata, and you don't want to load all of ffmpeg into every webserver or file browser just so it can expose the proper granular one.
o11c 2 days ago [-]
I mean, there is the xattr `user.mime_type`.
galad87 2 days ago [-]
That wouldn't scale well, something like .av1opusflacwebvtt?
ttoinou 2 days ago [-]
Video codec is the most important though !
axiolite 2 days ago [-]
Are you deaf? I can't see how anyone else would be happy downloading a file they can watch fine but can't hear.
ziml77 2 days ago [-]
Not to mention that the codec barely tells you anything useful on its own. There are many parameters that affect the quality and file size. And heck it's not even guaranteed that you will be able to watch a video just given its codec. Like an old computer might be able to decode H.264 but only be able to do it in real-time at 480p due to processing requirements for going higher than that.
ttoinou 2 days ago [-]
The goal is not to make something perfect its to make something better. AV1 / AV2 extension is better than MOV / MKV extension
ttoinou 2 days ago [-]
Audio patents are expiring and there shouldn’t be any difficulty decoding real time most audio codecs. If you can decode the video real time there’s a very good chance you can decode the audio real time too.
2 days ago [-]
Gigachad 2 days ago [-]
Is anyone else getting a cloudflare blocked on this page?
IshKebab 2 days ago [-]
We must be reaching the limit at which video codecs can only achieve better quality by synthesizing details. That's already pretty prevalent in still images - phone cameras do it, and there are lots of AI resizing algorithms that do it.
It doesn't look like AV2 does any of that yet though fortunately (except film grain synthesis but I think that's fine).
nasretdinov 2 days ago [-]
Arguably that's already happening with film grain — you have to extrapolate _what the original probably was_, encode it because it's smaller, then add the noise back to be more faithful to the original despite your image being better.
I imagine e.g. a picture of an 8x8 circle actually takes more bits to encode than a mathematical description of the same circle
rasz 2 days ago [-]
>I imagine e.g. a picture of an 8x8 circle actually takes more bits to encode than a mathematical description of the same circle
I wonder if there are codecs with provisions for storing common shapes. Text comes to mind - I imagine having a bank of 10 most popular fonts an encoding just the difference between source and text + distortion could save quite a lot of data on text heavy material. Add circles, lines, basic face shapes.
_kb 2 days ago [-]
Outside of AV1/2 (and linear media in general) that's already well and truly developed tech. Nvidia DLSS, AMD FSR and Intel XeSS all provide spatial/temporal super sampling to process lower fidelity base renders [0].
There also seems to be a fair bit of attention on that problem space from the real-time comms vendors with Cisco [1], Microsoft [2] and Google [3] already leaning on model based audio codecs. With the advantages that provides both around packet loss mitigation and shifting costs to end user (aka free) compute and away from central infra I can't see that not extending to the video channel too.
Please no. This is what jbig2 does for images and it’s a nightmare in my view, you can’t trust the result is not something totally different from the original [1]
When are they going to make a video codec based on gaussian splatting?
7e 2 days ago [-]
Oh, is more HEVC stuff finally going off patent? They’re the leaders.
illiac786 1 days ago [-]
It would be great. As to being the leaders, even without the patented stuff, AV1 is still more efficient.
jeden 2 days ago [-]
I wait on new codec invented #AI
occz 2 days ago [-]
You'll be waiting for a long time then, probably. Making codecs is actually a hard problem, the type of thing that AI completely falls over when tasked with.
lxgr 2 days ago [-]
Compression is actually a very good use case for neural networks (i.e. don't have an LLM develop a codec, but rather train a neural network to do the compression itself).
Considering AI is good at predicting things and that’s largely what compression does, I could see machine learning techniques being useful as a part of a codec though (which is a completely different thing from asking ChatGPT to write you a codec)
lyu07282 2 days ago [-]
Yeah in the future we might use some sort of learned spatial+temporal representation to compress video, same for audio. Its easier to imagine for audio: Instead of storing the audio samples, we store text + some feature vectors that uses some model to "render" the audio samples.
whycome 2 days ago [-]
It’s not absurd to think that you could send a model of your voice to a receiving party and then have your audio call just essentially be encoded text that gets thrown through the voice generator on the local machine.
AI video could mean that essential elements are preserved (actors?) but other elements are generated locally. Hell, digital doubles for actors could also mean only their movements are transmitted. Essentially just sending the mo-cap data. The future is gonna be weird
JimDabell 2 days ago [-]
Yeah, I brought that up here and got some interesting responses:
> It would be interesting to see how far you could get using deepfakes as a method for video call compression.
> Train a model locally ahead of time and upload it to a server, then whenever you have a call scheduled the model is downloaded in advance by the other participants.
> Now, instead of having to send video data, you only have to send a representation of the facial movements so that the recipients can render it on their end. When the tech is a little further along, it should be possible to get good quality video using only a fraction of the bandwidth.
In the future, our phone contacts will store name, address, phone number, voice model. (The messed up part will be that the user doesn’t necessarily send their model, but the model could be crafted from previous calls)
You could probably also transmit a low res grayscale version of the video to “map” any local reproduction to. Kinda like how a low resolution image could be reasonably reproduced if an artist knew who the subject was.
Rendered at 17:43:04 GMT+0000 (Coordinated Universal Time) with Vercel.
I have a top-of-the-line 4K TV and gigabit internet, yet the compression artifacts make everything look like putty.
Honestly, the best picture quality I’ve ever seen was over 20 years ago using simple digital rabbit ears.
You especially notice the compression on gradients and in dark movie scenes.
And yes — my TV is fully calibrated, and I’m paying for the highest-bandwidth streaming tier.
Not my tv, but a visual example: https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....
That’s why, presumably, Netflix came up with the algorithm for removing camera grain and adding synthetically generated noise on the client[0], and why YouTube shorts were recently in the news for using extreme denoising[1]. Noise is random and therefore difficult to compress while preserving its pleasing appearance, so they really like the idea of serving everything denoised as much as possible. (The catch, of course, is that removing noise from live camera footage generally implies compromising the very fine details captured by the camera as a side effect.)
[0] https://news.ycombinator.com/item?id=44456779
[1] https://news.ycombinator.com/item?id=45022184
1. camera manufacturers and film crews both do their best to produce a noise-free image 2. in post-production, they add fake noise to the image so it looks more "cinematic" 3. to compress better, streaming services try to remove the noise 4. to hide the insane compression and make it look even slightly natural, the decoder/player adds the noise back
Anyone else finding this a bit...insane?
This is not correct, camera manufacturers and filmakers engineer _aesthetically pleasing_ noise (randomized grains appear smoother to the human eye than clean uniform pixels). The rest is still as silly as it sounds.
But yes, there are definitely also many DPs that like their grain baked-in and camera companies that design cameras for that kind of use.
It is only an issue in content delivery.
Does this explain why i dislike 4K content on a 4K TV? Where some series and movies look too realistic, what in turn gives me a amateur film feeling (like somebody made a movie with a smartphone).
https://en.wikipedia.org/wiki/Soap_opera_effect
Which is generally associated with excess denoisong rather than with excess grain.
This comment that I replied to is almost a textbook description of the soap opera effect.
The interpolation adds more FPS, which is traditionally a marker of film vs TV production.
This is patently wrong. The rest builds up on this false premise.
1.2.: Google "how to shoot a night scene" or something like that. You'll find most advice goes something along the lines of "don't crank the ISO up, add artificial lighting to brighten shadows instead". When given a choice, you'll also find cinematographers use cameras with particularly good low-light performance for dark scenes (that's why dark scenes in Planet Earth were shot on the Sony A7 - despite the "unprofessional" form factor, it had simply the best high-ISO performance at the time)
2: Google "film grain effect". You'll find a bunch of colorists explaining why film grain is different from ISO noise and why and how you should add it artificially to your films.
Next, no shit aesthetics are subjective, I never said this is the one objective truth. I said this is a thing that many people believe, as evidenced by the plethora of resources talking about the difference between noise and grain, why tasteful grain is better than a completely clean image and how to add it in post.
And finally, come on, it's obvious to everyone in this thread that I'm referring to digital, which is also not "just a subset" it's by far the biggest subset.
So idk what your point is. Most things are shot digitally. Most camera companies try to reduce sensor noise. Most camera departments try to stick to the optimal ISO for their sensor, both for dynamic range and for noise reasons, adjusting exposure with other means. In my experience, most people don't like the look of sensor/gain/iso/whatever noise. Many cinematographers and directors like the look of film grain and so they often ask for it to be added in post.
Besides the many/most/some qualifiers possibly not matching with how you percieve this (which is normal, we're all different, watch different content, work in different circles...), where exactly am I wrong?
> it's obvious to everyone in this thread that I'm referring to digital
It was blindingly obvious that you meant digital. That’s why I pointed this out. Without mentioning that it is only a concern with digital photography, your points become factually incorrect on more than one level; because the thread wasn’t talking specifically about digital photography, some of your points about noise don’t apply even if they were correct—which they aren’t, by your own admission that photography is subjective. Producing a noise-free image is not the highest priority for film crews (for camera manufacturers it is, but that’s because it means more flexibility in different conditions; it does not mean film crews will always prioritize whatever settings give them the lowest noise, there are plenty of higher priorities), and in some cases they choose to produce image with some noise despite the capability to avoid it.
Sorry, with your googling suggestions it just reads like a newbie’s take on subject matter.
TLDR - because although you made valid points on a specific area, you made no acknowledgement to my own favorite specific area, thusly I shall publicly declare your valid points shall not be taken seriously for others to read
h.264 only had FGS as an afterthought, introduced years after the spec was ratified. No wonder it wasn’t widely adopted.
VP9, h.265 and h.266 don’t have FGS.
1. Video codecs like the denoise, compress, synthetic grain approach because their purpose is to get the perceptually-closest video to the original with a given number of bits. I think we should be happy to spend the bits on more perceptually useful information. Certainly I am happy with this.
2. Streaming services want to send as few bytes as they can get away with. So improvements like #1 tend to be spent on decreasing bytes while holding perceived quality constant rather than increasing perceived quality while holding bitrate constant.
I think one should focus on #2 and not be distracted by #1 which I think is largely orthogonal.
The hard disk space to store an episode of a show is $0.01. With peering agreements, the bandwidth of sending the show to a user is free.
I'm not sure why you think this, but it's one of the oddest things I've seen today.
The more streams you can send from a single server the lower your costs are.
That's not a correctly calibrated TV. The contrast is tuned WAY up. People do that to see what's going on in the dark, but you aren't meant to really be able to see those colors. That's why it's a big dark blob. It's supposed to be barely visible on a well calibrated display.
A lot of video codecs will erase details in dark scenes because those details aren't supposed to be visible. Now, I will say that streaming services are tuning that too aggressively. But I'll also say that a lot of people have miscalibrated displays. People simply like to be able to make out every detail in the dark. Those two things come in conflict with one another causing the effect you see above.
Someone needs to tell filmmakers. They shoot dark scenes because they can - https://www.youtube.com/watch?v=Qehsk_-Bjq4 - and it ends up looking like shit after compression that assumes normal lighting levels.
i disagree completely. i watch a movie for the filmmakers story, i don’t watch movies to marvel at compression algorithms.
it would be ridiculous to watch movies shot with only bright scenes because streaming service accountants won’t stop abusing compression to save some pennies.
> …ends up looking like shit after compression that assumes normal lighting levels.
it’s entirely normal to have dark scenes in movies. streaming services are failing if they’re using compression algorithms untuned to do dark scenes when soooo many movies and series are absolutely full of night shots.
It should be noted, as well, that this generally isn't a "not enough bits" problem. There are literally codec settings to tune which decide when to start smearing the darkness. On a few codecs (such as VP1) those values are pretty badly set by default. I suspect streaming services aren't far off from those defaults. The codec settings are instead prioritizing putting bits into the lit parts of a scene rather than sparing a few for the darkness like you might like.
The issue is just that we don't code video with nearly enough bits. It's actually less than 8-bit since it only uses 16-235.
Before COVID Netflix were at least using 8Mbps for 1080P content. With x264 / beamr it is pretty good, and even better on HEVC. Then COVID hit, every streaming service not just Netflix have excuses to lower their quality due to increased demand with limited bandwidth. Everything went down hill since then. Customer got used to lower quality I dont believe they ever bring it back up. Now it is only something like 3-5Mbps according to previous test posted on HN.
And while it is easy for HEVC / AV1 / AV2 to have 50%+ bitrate real world savings compared to H.264 saving at 0.5 - 4Mbps range, once you go pass that the savings begin to shrink rapidly to the point good old x264 encoder may perform better at much higher bitrate.
Kate - Netflix - 11.15 Mbps
Andor - Disney - 15.03 Mbps
Jack Ryan - Amazon - 15.02 Mbps
The Last of Us - Max - 19.96 Mbps
For All Mankind - Apple - 25.12 Mbps
https://hd-report.com/streaming-bitrates-of-popular-movies-s...
You will be made to feel the springs on the cheapest plan/mattress, and it's on purpose so you'll pay them more for something that costs them almost nothing.
Adaptive streaming isn't really adaptive anymore. If you have any kind of modern broadband, the most adaptive it will be is starting off in one of the lower bitrates for the first 6 seconds before jumping to the top, where it will stay for the duration of the stream. A lot of clients don't even bother with that anymore; they look at the manifest, find the highest stream, and just start there.
That can happen at even the highest bitrates if "HDR" is not enabled in the video codec.
Related video: https://www.youtube.com/watch?v=h9j89L8eQQk
Also the whole "you can hear more with lossless audio" is just straight up a lie.
The “best” quality of streaming you have is Sony Core https://en.wikipedia.org/wiki/Sony_Pictures_Core but it has a rather limited library.
Pricing, if I am reading the site correctly: $7k-ish for a server (+$ for local disks, one assumes), $2-5k per client. So you download the movie locally to your server and play it on clients scattered throughout your mansion/property.
Not out of the world for people who drop 10s of thousands on home theater.
I wonder if that's what the Elysium types use in their NZ bunkers.
No true self-respecting, self-described techie (Scotsman) would use it instead of building their own of course.
Right now, Netflix can say stuff like "we think the 4K video we're serving is just as good." If they offer a real-4K tier, it's hard to make that argument.
The biggest jump in quality was when everything was still analog over the air, but getting ready for the digital transition.
Then digital over the air bumped it up a notch.
You could really see this happen on a big CRT monitor with the "All-in-Wonder" television receiver PCI graphics adapter card.
You plugged in your outdoor antenna or indoor rabbit ears to the back of the PC, then tuned in the channels using software.
These were made by ATI before being acquired by AMD, the TV tuner was in a faraday cage right on the same PCB as the early GPU.
The raw analog signal was upscaled to your adapter's resolution setting before going to the CRT so you had pseudo better resolution than a good TV like a Trinitron. You really could see more details and the CRT was smooth as butter.
As the TV broadcaster's entire equipment chain was replaced, like camera lenses, digital sensors and signal processing they eventually had everything in place and working. You could notice these incremental upgrades until a complete digital chain was established as designed. It was really jaw-dropping. This was well in advance of the deadline for digital deployment, so the signal over-the-air was still coming in analog the same old way.
Eventually the broadcast signal switched to digital and the analog lights went out, plus the All-in-Wonder was not ideal with a cheap converter like analog TV's could get by with.
But it was still better than most digital TVs for a few years, then it took years more before you could see the ball in live sports as well as on a CRT anyway.
Now that's about all you've got for full digital resolution, live broadcasts from your local stations, especially live sports from a strong interference-free station over an antenna. You can switch between the antenna and cable and tell the difference when they're both not overly compressed.
The only thing was, digital engineers "forgot" that TV was based on radio (who knew?) so for the vast majority of "listeners" on the fringe reception areas who could get clear audio but usually not a clear picture if any, too bad for you. You're gonna need a bigger antenna, good enough to have gotten you a clear picture during the analog days. Otherwise your "clean" digital audio may silently appear on the screen as video, "hidden" within the sparse blocks of scattered random digital noise. When anything does appear at all.
At the higher prices, I'd have to agree with you. If you pay for the best you should get the best.
That I find super hard to believe!
https://en.wikipedia.org/wiki/ATSC_standards#MPEG-2
It could be a single channel, but usually you have many in the multiplex. I don't know how it works in the US, but for DVB-T(2) that's how it is.
Circa 2019, after the FCC "repack" / "incentive auction" (to free-up TV channels for cellular LTE use) it became very common for each RF channel to carry 4+ channels. But to be fair, many broadcasters did purchase new, improved MPEG-2 encoders at that time, which do perform better with a lower bit-rate, so quality didn't degrade by a lot.
Is this just people being clever or is it also more processing power being thrown at the problem when decoding / encoding?
For example, changes from one frame to the next are encoded in rectangular areas called "superblocks" (similar to a https://en.wikipedia.org/wiki/Macroblock). You can "move" the blocks (warp them), define their change in terms of other parts of the same frame (intra-frame prediction) or by referencing previous frames (inter-frame prediction), and so on... but you have to do it within a block, as that's the basic element of the encoding.
The more tightly you can define blocks around the areas that are actually changing from frame to frame, the better. Also, it takes data to describe where these blocks are, so there are special limitations on how blocks are defined, to minimise how many bits are needed to describe them.
AV2 now lets you define blocks differently, which makes it easier to fit them around the areas of the frame that are changing. It has also doubled the size of the largest block, so if you have some really big movement on screen, it takes fewer blocks to encode that.
That's just one change, the headline improvement comes from all the different changes, but this is an important one.
There is new cleverness in the encoders, but they need to be given the tools to express that cleverness -- new agreement about what types of transforms, predictions, etc. are allowed and can be encoded in the bitstream.
https://youtu.be/Se8E_SUlU3w?t=242
Is there a reason codec's don't use the previous frame(s) as stored textures, and remap them on the screen? I can move a camera through room and a lot of the texture is just reprojectivetransformed.
That's what AV1 calls global motion and warped motion. Motion deltas (translation/rotation/scaling) can be applied to the whole frame, and blocks can be sheared vertically/horizontally as well as moved.
Consider a scene with a couple of cars moving on a background, one can imagine a number of vertices around the contour of each car, and reusing the previous car, it makes no sense to force the shape of blocks. The smaller the seams between shapes (reusing previous frames as textures), the fewer pixels it needs to reconstitute de novo. The more accurate the remapping xy_old(x_prev,y_prev)-><x,y>, the lower the error signal that needs to be reconstructed.
Also the majority of new contour vertex locations can be reused as the old contour locations in the next frame decoding. Then only changes in contour vertexes over time need to be encoded, like when a new shape enters the scene, or a previously static object starts moving. So there is a lot of room for compression.
I mean, that's more or less how it works already. But you still need a unit of granularity for the remapping. So the frame will store eg this block moves by this shift, this block by that shift etc.
This is exactly what I question. Why should there be block shaped units of granularity? defining a UV-textured 3D mesh that moves and carries previous decoded pixel values should have much less seams, with a textured mesh instead of blocks the only de novo pixel values would be the seams between reusable parts of the mesh, for example when an object rotates and reveals a newly visible part of its surface.
It's true you could still accidentally violate a patent but that minefield is clearing out as those patents simply have to become more esoteric in nature.
But that's not my main point. My main point is that we are going down a fitting path with codecs which makes it hard to come up with general patents that someone might stumble over. That makes patents developed by the MPEG group far less likely to apply to AOM. A lot of those more generally applicable patents, like the DCT for example, have expired.
1) it harms interoperability
2) I thought math wasn’t patentable?
At the absolute compression limit, it's no longer video, but a machine description of the scene conceptually equivalent to a textual script.
All of this requires a significant amount of extra logic gates/silicon area for hardware decoders, but the bit rate reduction is worth it.
For CPU decoders, the additional computational load is not so bad.
The real additional cost is for encoding because there’s more prediction tools to choose from for optimal compression. That’s why Google only does AV1 encoding for videos that are very popular: it doesn’t make sense to do it on videos that are seen by few.
Clever matters a lot more for encoding. If you can determine good ways to figure out the motion information without trying them all, that gets you faster encoding speed. Decoding doesn't tend to have as much room for cleverness; the stream says to calculate the output from specific data, so you need to do that.
Better codecs are an overall win for everyone involved.
I don’t remember ever watching a movie and wishing for a better codec, in the last 10 years
I do wish ATSC1 would adopt a newer codec (and maybe they will), most of the broadcasters cram too many subchannels in their 20mbps and a better codec would help for a while. ATSC3 has a better video codec and more efficient physical encoding, but it also DRM and a new proprietary audio codec, so it's not helpful for me.
They also get increased power usage, lesser battery life, higher energy bills, and potentially earlier device failures.
> Better codecs are an overall win for everyone involved.
Right.
Mobile/power constrained devices don't use software decoding, that just a path to miserable experience. Hardware decoding is basically required.
Meanwhile my desktop can SW decode 4k youtube with 3% reported cpu usage.
I like how you padded this list by repeating the same thing thrice. Like, increased power usage is obviously going to lead to higher energy bills.
And it’s especially weird because it’s not true? The current SOTA codec AV1 is at a sweet spot for both compression and energy demand (https://arxiv.org/html/2402.09001v1). Consumers are not worse off!
But, I mean, your expectation is not that unreasonable, computers were quite good by 2013. It is just an eye-opening framing.
And there’s no transfer of effort to the user. Compute complexity of video codecs is asymmetric. The decode is several order of magnitude cheaper to compute than the encode. And in every case, the principal barrier to codec adoption has been hardware acceleration. Pretty much every device on earth has a hardware-accelerated h264 decoder.
I find the idea fun, kinda like using snapchat filters on characters, but in practice I'm sure it'll be used to cut corners and prevent the actual creative vision from being shown which saddens me.
It feels like we’re losing something, a shared experience, in favor of an increasingly narcissistic attitude that everything needs to shapeable to individual preferences instead of accepting things as they are.
I’d be somewhat interested in something like a git that generates movies, that my friends can push to.
Extremely widespread mass media fiction broadcast are sort of an aberration of the last 75 years or so. I mean, you’d have works in ancient times—the Odyssey—that are shared across a culture. But, that was still a story customize by each teller, and those sorts of stories were rare. Canon was mainly a concern of religions.
It’s just for fun, we give it far too much weight nowadays.
May be more data and numbers. Including Encoding Complexity increase, decoding complexity. Hardware Decoder roadmap. Compliance and Test kits. Future Profile. Involvement and improvement to both AVIF the format and the AV2 image codec. Better than JPEG-XL? Are the ~30% BDRATE compared to current best AV1 encoder or AV1 1.0 as anchor point? Live Encoding improvements?
[1] https://aomedia.org/events/live-session-the-future-of-innova...
So it seems like they checked that all their ideas could be implemented efficiently in hardware as they went along, with advice from real hardware producers.
Hopefully AV2-capable hardware will appear much quicker than AV1-capable hardware did.
Wait, I just discovered GPUs, nevermind. [giggles]
Still, the ability to do specialized work should probably be offloaded to specialized but pluggable hardware. I wonder what the economics of this would be...
Providing a production grade verified RTL implementation would obviously be useful but also entire companies exist to do that and they charge a lot of money for it.
A h265 or AV1 decoder requires millions of logic gates (and DRAM memory bandwidth.) Only high-end FPGAs provide that.
The complexity of video decoders has been going up exponentially and AV2 is no exception. Throwing more tools (and thus resources) at it is the only way to increase compression ratio.
Take AV1. It has CTBs that are 128x128 pixels. For intra prediction, you need to keep track of 256 neighboring pixels above the current CTB and 128 to the left. And you need to do this for YUV. For 420, that means you need to keep track of (256+128 + 2x(128+64)) = 768 pixels. At 8 bits per component, that's 8x768=6144 flip-flops. That's just for neighboring pixel tracking, which is only a tiny fraction of what you need to do, a few % of the total resources.
These neighbor tracking flip-flops are followed by a gigantic multiplexer, which is incredibly inefficient on FPGAs and it devours LUTs and routing resources.
A Lattice ECP5-85 has 85K LUTs. The FFs alone consume 8% of the FPGA. The multiplier probably another conservative 20%. You haven't even started to calculate anything and your FPGA is already almost 30% full.
FWIW, for h264, the equivalent of that 128x128 pixel CTB is 16x16 pixel MB. Instead of 768 neighboring pixels, you only need 16+32+2*(8+16)=96 pixels. See the difference? AV2 retains the 128x128 CTB size of AV1 and if it adds something like MRL of h.266, the number of neighbors will more than double.
H264 is child's play compared later codecs. It only has a handful of angular prediction modes, it has barely any pre-angular filtering, it has no chroma from luma prediction, it only has a weak deblocking filter and no loop filtering. It only has one DCT mode. The coding tree is trivial too. Its entropy decoder and syntax processing is low in complexity compared to later codecs. It doesn't have intra-block copy. Etc. etc.
Working on a hardware video decoder is my day job. I know exactly what I'm talking about, and, with all due respect, you clearly do not.
Your argument about your large amount of flops is odd. You would only store data that way if you needed everything on the same cycle. You say there's a multiplexor after that. Data storage + multiplexor is just memory. Could use a bram or lutram which would cut down on that dramatically, big if there's a need based on later processing which you haven't defined. And even then, that's for AV1 which isn't AV2 and may change
Let’s cut to the chase. AV2 will not be smaller than AV1 at all. The linked article doesn’t say that. The slides don’t say that either.
The only thing that could make somebody think that it’s smaller is the claim that all tools have been validated for hardware efficiency. The goal of this process is to make sure that none of the new tools make the HW unreasonably explode in size, not to make the codec smaller than before, because everyone knows that this is impossible if you want to increase compression ratio.
Let’s look at 2 of those new tools. MRLS: this adds multiple reference lines, just like I expected there would be. Boom! Much more complexity for neighbor handling. I also see more directions (more angles.) That also adds HW. The article mentions improved chroma from luma. Not unexpected because h266 already has that, and AV2 needs to compete against that. AV1 has a basic 2x2 block filter. I expect AV2 to have a more complex FIR filter, which makes things significantly harder for a HW implementation.
You are delusional if you think AV2 will be smaller than AV1.
The reason I brought up neighbor handling is because it’s so easy to estimate its resource requirements from first principles, not because it’s a huge part of a decoder. But if neighbors alone already make a smaller FPGA nearly impossible, it should be obvious that the whole decoder is ridiculous.
So… as for storing neighbors in RAM: if I’d bring this up at work, they’d probably send me home to take mental health break or something.
Neighbor processing lives right inside the critical latency loop. Every clock cycle that you add in that loop impacts performance. You need to update these neighbors after predicting every coding unit. Oh, and the article mentions that the CTB size (“super block” in AV2 parlance) has been increased from 128x128 to 256x256. Good luck area reducing that. :-)
And what hobbyist is sending off decoding chips to be fabbed? If this exists, it sounds interesting if incredibly impractical.
While it worked, I don't think it ever left my machine. Never moved past software decoding -- I was a broke teen with no access to non-standard hardware. But the idea has stuck with me and feels more relevant than ever, with the proliferation of codecs we're seeing now.
It has the Sufficiently Smart Compiler problem baked in, but I tried to define things to be SIMD-native from the start (which could be split however it needed to be for the hardware) and I suspect it could work. Somehow.
They're called GPUs... They're ASICs rather than FPGAs, but it's easy to update the driver software to handle new video codecs. The difficulty is motivating GPU manufacturers to do so... They'd rather sell you a new one with newer codec support as a feature.
The main point of having ASICs for video codecs these days is efficiency, not being able to real-time decode a stream at all (as even many embedded CPUs can do that at this point).
But often a new codec requires decoders to know how to work with new things that the fixed function hardware likely can't do.
Encoding might actually be different. If your encoder hardware can only do fixed block sizes, and can only detect some types of motion, a driver change might be able to package it up as the new codec. Probably not a lot of benefit, other than ticking a box... but might be useful sometimes. Especially if you say offload motion detection, but the new codec needs different arithmetic encoding, you'd need to use cpu (or general purpose gpu) to do the arithmetic encoding and presumably get a size saving over the old codec.
If true, that would be amazing.
Isn't AVI a container format and not a codec?
I don’t understand why 60fps never became ubiquitous, a pan scene in 30fps is horrible, its almost stroboscopic to me.
AVIF is also a container format, and I believe should be adaptable to AV2, even if the name stands for "AV1 image format". It could simply just be renamed to AOMedia Video Image Format for correctness.
Maybe that’s what we did in the past and it was a bad idea. It’d be useful to know if you can read the file by looking only at its extension
> It’d be useful to know if you can read the file by looking only at its extension
That would be madness, and there's already a workaround - the filename itself.
For most people, all that matters is an MKV file is a video file, and your configured player for this format is VLC. Only in a small number of cases does it matter about an "inner" format, or choice of parameter - e.g. for videos, what video codec or audio codec is in use, what the bitrate is, what the frame dimensions are.
For where it _matters_, people write "inner" file formats in the filename, e.g. "Gone With The Wind (1939) 1080p BluRay x265 HEVC FLAC GOONiES.mkv", to let prospective downloaders choose what to download from many competing encodings of exactly the same media, on websites where a filename is the _only_ place to write that metadata (if it were a website not standardised around making files available and searching only by filenames, it could just write it in the link description and filename wouldn't matter at all)
Most people don't care, for example, that their Word document is A4 landscape, so much that they need to know _in the filename_.
That's pretty much always been the case. File extensions are just not expressive enough to capture all the nuances of audio and video codecs. MIME types are a bit better.
Audio is a bit of an exception with the popularity of MP3 (which is both a codec and a relatively minimal container format for it).
It doesn't look like AV2 does any of that yet though fortunately (except film grain synthesis but I think that's fine).
I imagine e.g. a picture of an 8x8 circle actually takes more bits to encode than a mathematical description of the same circle
I wonder if there are codecs with provisions for storing common shapes. Text comes to mind - I imagine having a bank of 10 most popular fonts an encoding just the difference between source and text + distortion could save quite a lot of data on text heavy material. Add circles, lines, basic face shapes.
There also seems to be a fair bit of attention on that problem space from the real-time comms vendors with Cisco [1], Microsoft [2] and Google [3] already leaning on model based audio codecs. With the advantages that provides both around packet loss mitigation and shifting costs to end user (aka free) compute and away from central infra I can't see that not extending to the video channel too.
[0]: https://mtisoftware.com/understanding-ai-upscaling-how-dlss-...
[1]: https://www.webex.com/gp/webex-ai-codec.html
[2]: https://techcommunity.microsoft.com/blog/microsoftteamsblog/...
[3]: https://research.google/blog/lyra-a-new-very-low-bitrate-cod...
[1]https://www.dkriesel.com/en/blog/2013/0802_xerox-workcentres...
Not quite yet as shown in H.267. But at some point the computational requirement vs bandwidth saving benefits would no longer make sense.
[1]: https://bellard.org/nncp/
It works amazingly well with text compression, for example: https://bellard.org/nncp/
AI video could mean that essential elements are preserved (actors?) but other elements are generated locally. Hell, digital doubles for actors could also mean only their movements are transmitted. Essentially just sending the mo-cap data. The future is gonna be weird
> It would be interesting to see how far you could get using deepfakes as a method for video call compression.
> Train a model locally ahead of time and upload it to a server, then whenever you have a call scheduled the model is downloaded in advance by the other participants.
> Now, instead of having to send video data, you only have to send a representation of the facial movements so that the recipients can render it on their end. When the tech is a little further along, it should be possible to get good quality video using only a fraction of the bandwidth.
— https://news.ycombinator.com/item?id=22907718
Specifically for voice, this was mentioned:
> A Real-Time Wideband Neural Vocoder at 1.6 Kb/S Using LPCNet
— https://news.ycombinator.com/item?id=19520194
You could probably also transmit a low res grayscale version of the video to “map” any local reproduction to. Kinda like how a low resolution image could be reasonably reproduced if an artist knew who the subject was.