Title has a misleading domain name (gwern.net). Link is to a PhD thesis titled "Scaling Laws for Deep Learning" by Jonathan Rosenfeld. Not sure why wasn't linked more directly,
Gwern, have you considered hosting your archived docs on a different subdomain (e.g., doc.gwern.net) to make it clearer that they are not something you have authored yourself? Not sure what the best subdomain would be though.
gwern 2 days ago [-]
I don't think that would make it any clearer. Why would 'doc.gwern.net' be more obviously just a random document than 'gwern.net/doc/www/'?
Regardless, I am puzzled how OP got this URL in the first place. He wasn't supposed to, he was supposed to get the canonical Arxiv PDF link. Because this is one of the cache mirrors/local archives†, rather than a regular hosted document. We block everything in /doc/www/ in robots.txt & HTTP no-archive/crawl/mirror/etc headers, and we use JS to swap out the local URL for the original URL whenever the reader clicks or mouse-overs or interacts with a link to the URL in a web page (and that is the only place they should be publicly listed or accessible). If OP read it on gwern.net by seeing a link to it, and he wanted to copy the URL elsewhere, he should have just gotten the canonical "https://arxiv.org/pdf/2108.07686#page=85"... But somehow he didn't.
OP, do you remember how exactly you grabbed this URL? Is this an old link from before our URL swapping was implemented, or did you deliberately work around it, or did you find some place we forgot to swap, or what?
(If anyone is wondering why I mirror Arxiv PDFs like this in the first place: it's for the PDF preview feature in the popups. Because Arxiv blocks itself from being loaded in iframes we need local mirrors for PDF preview to work at all; local mirrors save a new domain lookup and speeds up the PDF preview a lot because we compress the PDF more thoroughly and Arxiv servers are always overloaded; and because readers can potentially pop up many Arxiv PDFs easily, it saves Arxiv a lot of bandwidth and avoid burdening their servers further, so it's just the responsible thing to do.)
Not OP, but the HN crowd here often browses without JS.
Quickly testing a no-JS session, I do see your archive URLs instead of arxiv ones.
gwern 1 days ago [-]
Yes, without the swapping JS, you wouldn't get the canonical URL. But browsing Gwern.net these days without JS is pretty painful. And in this particular case, there is only one place on Gwern.net that the link exists where you could see it without JS; in the other 5 or 6 links, you could only get there via JS and thus the swapping should've happened. So it is not a safe assumption that OP simply browsed with NoScript.
sleepingreset 23 hours ago [-]
Hi Gwern, I'm honestly not sure. I have some firefox extension that skips trackers and other redirects. I have like 100 firefox extensions, actually. I'm not sure how most of them work nor what they do exactly, I just trust that they make my browser more "secure" and I tend to download things at random -- especially if I see ads or want certain features in my client (i.e. a browser that auto-rejects cookies).
Happy to try and help you figure this out but when I revisit this specific hyperlink I'm still getting the gwern url & not arxiv
gwern 5 hours ago [-]
Hm. So you're getting the raw URL but you don't have NoScript / block JS specifically? Can you check in an incognito window and if it still happens, ablate all your extensions in a fresh profile (https://support.mozilla.org/en-US/kb/profile-manager-create-...), which is something you might want to keep handy if you really have 100 extensions running and no idea what they all do...
littlestymaar 2 days ago [-]
> Why would 'doc.gwern.net' be more obviously just a random document than 'gwern.net/doc/www/'?
HN only shows the domain next to the title. So now when browsing the front page we only see gwern.net as the source of the doc and initially assume it's some work from you.
jolmg 2 days ago [-]
I don't think HN shows third-level domains, so the point is moot. There may be exceptions for web services that lend out subdomains like Github[1], but doc.gwern.net would probably still show as gwern.net[2]. If you're willing to see the URL in the browser statusbar or addressbar, then the URL path makes very clear that the actual source is arxiv.org.
That brings up the second question though, which is why someone would assume that docs.gwern.net links to a document not by Gwern.
telotortium 2 days ago [-]
That's why I'm trying to think of a better subdomain.
- archive.gwern.net?
- static.gwern.net?
- thirdparty.gwern.net?
- localarchive.gwern.net?
svantana 2 days ago [-]
I think the basic premise of this paper is wrong. Very few natural signals are bandlimited - if images were, they would be no need to store in high resolution, you could just upsample. Natural spectra tend to be pink (decaying ~3dB/octave), which can be explained by the fractal nature of our world (zoom in on details and you find more detail).
Of course that says that our eyes (& more generally our sensory organs) are bandlimited which is what lossy signal compression algorithms exploit (similar to how MP3 throws away acoustic signals we can't hear or how even "lossless" is still only recorded at 44 kHz). And indeed any sensor has this problem and it's a physical limitation (e.g. there's only so much resolving power an optical sensor of a certain size can have for an object of a certain distance away which is why we can't see microscopic things and this is a limit from the physics of optics)
It says nothing about the underlying signal in nature. But of course we're building LLMs to interact with humans rather than to learn about signals in the true natural world that we might miss.
wbl 2 days ago [-]
Any optical system will have a finite resolution.
astrange 2 days ago [-]
That applies to individual samples. The eye gets around this by saccading (rapid movements) to get multiple samples. Also, you interact with your environment rather than passively sampling it, so if you want to look closer at something you can just do that.
Images aren't truly bandlimited because they contain sharp edges; if they were bandlimited you'd be happy to see an image upscaled with a Gaussian kernel, but instead it's obviously super blurry.
When we see an edge in a smaller image we "know" it's actually infinitely sharp. Another way to say this is that a single image of two people is fundamentally two "things", but we treat it as one unified "thing" mathematically. If all images came with segmentation data then we could do something smarter.
pvillano 2 days ago [-]
"In optics, any optical instrument or system – a microscope, telescope, or camera – has a principal limit to its resolution due to the physics of diffraction." This might be what wbl is referring to.
What would the implementation of a band limited LLM look like?
gwbas1c 3 days ago [-]
> In particular, this minimal frequency is twice the bandwitdh of the
function.
Careful, this is misleading.
If the peaks of the frequency align with your samples, you'll get the full bandwidth.
If the 0-crossings align with your samples, you'll miss the frequency.
These are why people swear by things like HD audio, SACD/DSD, even though "you can't hear over 20khz"
luma 2 days ago [-]
You've misunderstood something about Nyquist. A sample rate of, say, 44KHz, will capture ALL information below 22KHz and recreate it perfectly.
There are of course implementation details to consider, for example you probably want to have a steep filter so you don't wind up with aliasing artifacts from content above 22KHz. However it's important to understand: Nyquist isn't an approximation. If your signal is below one half the sample rate, it will be recreated with no signal lost.
GlenTheMachine 2 days ago [-]
Nyquist is a mathematical statement. As such, it has two commonly overlooked requirements:
- the signal being sampled has to be stationary
- you have an infinite number of samples
In that case, a sampling frequency of 2N+epsilon will perfectly reproduce the signal. Otherwise there can be issues.
alanbernstein 2 days ago [-]
I don't recall seeing Nyquist described with those requirements before. I think it is evident that in the real world, there are many practical signals which do not exactly meet those requirements, but which still yield nearly-exact reproduction.
I wonder, what are some examples of signals that fail to reproduce after sampling in a way that is "nearly Nyquist"?
GlenTheMachine 2 days ago [-]
If you look at the Wikipedia entry on the Nyquist Sampling Theorem, you should note that the summations to reconstruct the original signal go from negative infinity to positive infinity. In other words, that sum requires an infinite number of samples.
There are many signals of practical interest that can be approximately reconstructed with a finite truncation of the series. Note, however, that any signal that has only a finite length, eg has a uniformly zero amplitude after some time t_final, does not have a finite bandwidth, and cannot be exactly reconstructed by any sampling scheme. This is the case whenever you stop sampling a signal, eg it is always the case whenever you step outside the mathematical abstraction and start running real code on a real computer. So any signal reconstructed from samples is always approximate, except for some relatively trivial special cases.
drdeca 1 days ago [-]
Hm, yes, a function cannot have bounded support in both the time domain and the frequency domain…
What if you take a function that has bounded support in the time domain, and then turn it into a periodic function? Might the resulting function have bounded support in the frequency domain even though the original function did not?
I suppose doing this would force the Fourier transform to have discrete support? But under what conditions would it have bounded support?…
I guess technically a low-pass filter applied to a signal with finite support in the time domain, would result in a function which has infinite support in the time domain.
I suppose sinc(f t + c) doesn’t have bounded support, and it is unsurprising that a non-trivial linear combination of finitely many terms of this form would also not have finite support.
Still, such a linear combination could decay rather quickly, I imagine. (Idk if asymptotically faster than (1/t) , but (1/(f t)) is still pretty fast I think, for large f.)
Soon enough the decay should be enough that the amplitude should be smaller than the smallest that the speaker hardware is capable of producing, I suppose.
GlenTheMachine 1 days ago [-]
“ What if you take a function that has bounded support in the time domain, and then turn it into a periodic function?”
When you perform a finite sample reconstruction, this is essentially the unstated approximation you’re making.
ImageXav 2 days ago [-]
I think it is you who have misunderstood the Nyquist-Shannon theorem. Aliasing and noise are real concerns. Tim Wescott explains it very well [0] (Figures 3, 10 and 11). If your signal is below one half the sample rate but the noise isn't, you'll lose information about the signal. If your signal phase is shifted wrt. the sampling, you'll lose information. If your sampling period isn't representative, you'll lose information. These are not implementation details.
> You've misunderstood something about Nyquist. A sample rate of, say, 44KHz, will capture ALL information below 22KHz and recreate it perfectly.
Let's do a thought experiment. Imagine a digital image where the pixels are the exact minimum size that you can see.
If a line is exactly 1-pixel-wide, it'll display perfectly when it aligns perfectly with the pixels.
But, if the 1-pixel-wide image doesn't align with the pixels, what happens?
You can see this in practice when you have a large screen TV, and watch lower-resolution video. Smooth gradients look fine, but narrow lines have artifacts. IE, I recently saw a 1024p movie in the theater and saw pixels occasionally.
The same thing happens in sound, but because a lot of us have trouble hearing high frequencies, we don't miss it as much.
StrangeDoctor 2 days ago [-]
I was just about to post something saying similar. If I had to guess,
>If the 0-crossings align with your samples, you'll miss the frequency.
This is where the issue is. This isn’t possible with more than double the sampling rate.
kevin_thibedeau 2 days ago [-]
It can only happen with a source exactly at N/2 and correlated with your sampling clock. That doesn't happen in the real world for audio.
mlyle 2 days ago [-]
Anything close to N/2 is going to have varying magnitude that requires filtering and likely oversampling to remove.
How close to the Nyquist bandwidth you can get depends upon the quality of your filtering.
44.1KHz is a reasonable compromise for a 20KHz passband. 48KHz is arguably better now that bits are cheap-- get a sliver more than 20KHz and be less demanding on your filter. Garbage has to be way up above 28KHz before it starts to fold over into the audible region, too.
Sesse__ 2 days ago [-]
> Garbage has to be way up above 28KHz before it starts to fold over into the audible region, too.
You brick-wall everything at 20 kHz (with an analogue filter) before you sample it; that's part of the CD standard, and generally also what all other digital CD-quality audio assumes. This ensures there simply is no 28 kHz garbage to fold. The stuff between 20 and 28 in your reconstructed signal then is a known-silent guard band, where your filter is free to do whatever it wants—which in turn means that you can design it only for maximum flatness (and ideally, zero phase) below 20 kHz and maximum dampening above 28 kHz (where you will be seeing the start of your signal's mirror image after digital-to-audio conversion), not worrying about the 20–28 kHz region.
mlyle 2 days ago [-]
Your comment is mutually contradictory-- what is it, a brick wall (impossible) analog filter or a more gentle rolloff as things fold over?
What you really do, these days, is you sample at a higher frequency; you can have an exceptionally gentle analog filter (which will help you make it linear, too). E.g. if you sample at 96KHz, you just need to roll to near zero by 75KHz. Then you can digitally downsample/decimate to 44.1KHz or 48KHz.
Also note for CD audio, it's more like 24KHz where you get worried, not 28KHz.
Sesse__ 2 days ago [-]
You're mixing up the two filters. The pre-sample filter (before ADC) is defined to be a brickwall (of course impossible in practice, so in reality, it will have to start going off a bit before 20 kHz); the reconstruction filter (after DAC) has a roll-off.
mlyle 2 days ago [-]
I've not been talking about the reconstruction filter at all during any step of this discussion. Reading your comment more carefully, it seems you were trying to.
I'm saying that if you oversample, it's easier to get appropriate rejection from your pre-sampling filter and it's easier to make it appropriately flat as well.
E.g. sample at 384KHz; you need to reject stuff over 360KHz. You probably have negligible energy up there to begin with. A 3rd-order filter with -3dB at 30KHz might get the job done.
It's pretty easy to make this flat in phase and amplitude up to 20KHz, and things like capacitor nonlinearity are much less of a concern.
In turn, filtering down to 20KHz and rejecting from 22050 and up is easy in the digital domain. 512 taps gets me a filter flat to 0.15dB up to 20KHz and >63dB rejection over 22KHz.
My point was, this is a little better at 48KHz, because we can choose to e.g. pass 21KHz and have a wider guard band (14% vs 10%). With 384 taps, numbers are more like flat to 0.1dB and -67dB, benefitting both from the wider guard band and 48KHz being a factor of 384KHz.
Sesse__ 2 days ago [-]
Sure, you can implement the pre-sampling filter as a multistage filter, of which some of the stages are digital, if you wish. (I don't know where you get “rejecting from 22050 and up” from, though. For the pre-sample filter, you should reject from 20000 and up, and for the reconstruction filter, you should either reject from 24100 and up or 28000 and up, depending on whether you ended up sampling in 44.1 or 48.) But I don't think your argument makes much sense; if you're already in a domain where you have enough resources to sample at 384 kHz and run a 384-tap FIR filter over it, then surely you're high-end enough that you can't say “nah, who cares about the most common sample rate out there”.
mlyle 2 days ago [-]
When sampling:
You should pass all below 20KHz, as flat as possible. You definitely should stop 24.1KHz and up. How bad 22.05KHz to 24.1KHz is, is debatable.
> then surely you're high-end enough that you can't say “nah, who cares about the most common sample rate out there”.
I didn't say "don't support 44.1KHz" -- I'm saying there's good reasons to prefer 48KHz.
All being equal (same number of filter taps, etc)-- just a slightly higher sample rate offers a lot more performance because you can get a bit more frequency response and a lot flatter passband.
alfiedotwtf 2 days ago [-]
The Moog Minimoog filter goes beyond 20Khz. So even if you gently roll off at 20Khz, you’re going to miss overtones etc.
Sesse__ 2 days ago [-]
You can't hear ultrasound. For audio, the only question about 20+ kHz frequencies is how to get rid of them in the cheapest+best way.
marcosdumay 2 days ago [-]
Yep, that's why people do things like 44kHz sampling instead of 40kHz.
Sesse__ 2 days ago [-]
No, 44 kHz is because you want to reconstruct the (20 kHz) bandlimited signal and it's (much) easier to realize such a filter if you have a bit of a transition band.
01HNNWZ0MV43FF 2 days ago [-]
How bad is it around the frequencies I can hear as a 30-something?
pvillano 2 days ago [-]
Wasn't there an paper on band limiting generative CNNs, that fixed texture pinning? Basically by blurring the results of the kernel with neighbors, you get rid of all this aliasing?
https://arxiv.org/abs/2108.07686
https://arxiv.org/pdf/2108.07686#page=85
From that, you can gather that the two main papers that form the core of Rosenfeld's thesis are these:
https://openreview.net/pdf?id=ryenvpEKDr
https://proceedings.mlr.press/v139/rosenfeld21a.html
(if you prefer to read the gist in fewer pages.)
Regardless, I am puzzled how OP got this URL in the first place. He wasn't supposed to, he was supposed to get the canonical Arxiv PDF link. Because this is one of the cache mirrors/local archives†, rather than a regular hosted document. We block everything in /doc/www/ in robots.txt & HTTP no-archive/crawl/mirror/etc headers, and we use JS to swap out the local URL for the original URL whenever the reader clicks or mouse-overs or interacts with a link to the URL in a web page (and that is the only place they should be publicly listed or accessible). If OP read it on gwern.net by seeing a link to it, and he wanted to copy the URL elsewhere, he should have just gotten the canonical "https://arxiv.org/pdf/2108.07686#page=85"... But somehow he didn't.
OP, do you remember how exactly you grabbed this URL? Is this an old link from before our URL swapping was implemented, or did you deliberately work around it, or did you find some place we forgot to swap, or what?
(If anyone is wondering why I mirror Arxiv PDFs like this in the first place: it's for the PDF preview feature in the popups. Because Arxiv blocks itself from being loaded in iframes we need local mirrors for PDF preview to work at all; local mirrors save a new domain lookup and speeds up the PDF preview a lot because we compress the PDF more thoroughly and Arxiv servers are always overloaded; and because readers can potentially pop up many Arxiv PDFs easily, it saves Arxiv a lot of bandwidth and avoid burdening their servers further, so it's just the responsible thing to do.)
† https://gwern.net/archiving#preemptive-local-archiving
Happy to try and help you figure this out but when I revisit this specific hyperlink I'm still getting the gwern url & not arxiv
HN only shows the domain next to the title. So now when browsing the front page we only see gwern.net as the source of the doc and initially assume it's some work from you.
[1] Example: gliimly.github.io -> gliimly.github.io https://news.ycombinator.com/item?id=42148808
[2] Example: www.researchgate.net -> researchgate.net https://news.ycombinator.com/item?id=42181345
The [2] was not a convincing example because www sound something that'd get special treatment, but then I found this one:
tech.marksblogg.com -> marksblogg.com (https://news.ycombinator.com/item?id=42182519)
which proves you right. TIL.
- archive.gwern.net?
- static.gwern.net?
- thirdparty.gwern.net?
- localarchive.gwern.net?
It says nothing about the underlying signal in nature. But of course we're building LLMs to interact with humans rather than to learn about signals in the true natural world that we might miss.
Images aren't truly bandlimited because they contain sharp edges; if they were bandlimited you'd be happy to see an image upscaled with a Gaussian kernel, but instead it's obviously super blurry.
When we see an edge in a smaller image we "know" it's actually infinitely sharp. Another way to say this is that a single image of two people is fundamentally two "things", but we treat it as one unified "thing" mathematically. If all images came with segmentation data then we could do something smarter.
What would the implementation of a band limited LLM look like?
Careful, this is misleading.
If the peaks of the frequency align with your samples, you'll get the full bandwidth.
If the 0-crossings align with your samples, you'll miss the frequency.
These are why people swear by things like HD audio, SACD/DSD, even though "you can't hear over 20khz"
There are of course implementation details to consider, for example you probably want to have a steep filter so you don't wind up with aliasing artifacts from content above 22KHz. However it's important to understand: Nyquist isn't an approximation. If your signal is below one half the sample rate, it will be recreated with no signal lost.
- the signal being sampled has to be stationary
- you have an infinite number of samples
In that case, a sampling frequency of 2N+epsilon will perfectly reproduce the signal. Otherwise there can be issues.
I wonder, what are some examples of signals that fail to reproduce after sampling in a way that is "nearly Nyquist"?
There are many signals of practical interest that can be approximately reconstructed with a finite truncation of the series. Note, however, that any signal that has only a finite length, eg has a uniformly zero amplitude after some time t_final, does not have a finite bandwidth, and cannot be exactly reconstructed by any sampling scheme. This is the case whenever you stop sampling a signal, eg it is always the case whenever you step outside the mathematical abstraction and start running real code on a real computer. So any signal reconstructed from samples is always approximate, except for some relatively trivial special cases.
What if you take a function that has bounded support in the time domain, and then turn it into a periodic function? Might the resulting function have bounded support in the frequency domain even though the original function did not? I suppose doing this would force the Fourier transform to have discrete support? But under what conditions would it have bounded support?…
I guess technically a low-pass filter applied to a signal with finite support in the time domain, would result in a function which has infinite support in the time domain.
I suppose sinc(f t + c) doesn’t have bounded support, and it is unsurprising that a non-trivial linear combination of finitely many terms of this form would also not have finite support.
Still, such a linear combination could decay rather quickly, I imagine. (Idk if asymptotically faster than (1/t) , but (1/(f t)) is still pretty fast I think, for large f.)
Soon enough the decay should be enough that the amplitude should be smaller than the smallest that the speaker hardware is capable of producing, I suppose.
When you perform a finite sample reconstruction, this is essentially the unstated approximation you’re making.
[0] https://www.wescottdesign.com/articles/Sampling/sampling.pdf
Let's do a thought experiment. Imagine a digital image where the pixels are the exact minimum size that you can see.
If a line is exactly 1-pixel-wide, it'll display perfectly when it aligns perfectly with the pixels.
But, if the 1-pixel-wide image doesn't align with the pixels, what happens?
You can see this in practice when you have a large screen TV, and watch lower-resolution video. Smooth gradients look fine, but narrow lines have artifacts. IE, I recently saw a 1024p movie in the theater and saw pixels occasionally.
The same thing happens in sound, but because a lot of us have trouble hearing high frequencies, we don't miss it as much.
>If the 0-crossings align with your samples, you'll miss the frequency.
This is where the issue is. This isn’t possible with more than double the sampling rate.
How close to the Nyquist bandwidth you can get depends upon the quality of your filtering.
44.1KHz is a reasonable compromise for a 20KHz passband. 48KHz is arguably better now that bits are cheap-- get a sliver more than 20KHz and be less demanding on your filter. Garbage has to be way up above 28KHz before it starts to fold over into the audible region, too.
You brick-wall everything at 20 kHz (with an analogue filter) before you sample it; that's part of the CD standard, and generally also what all other digital CD-quality audio assumes. This ensures there simply is no 28 kHz garbage to fold. The stuff between 20 and 28 in your reconstructed signal then is a known-silent guard band, where your filter is free to do whatever it wants—which in turn means that you can design it only for maximum flatness (and ideally, zero phase) below 20 kHz and maximum dampening above 28 kHz (where you will be seeing the start of your signal's mirror image after digital-to-audio conversion), not worrying about the 20–28 kHz region.
What you really do, these days, is you sample at a higher frequency; you can have an exceptionally gentle analog filter (which will help you make it linear, too). E.g. if you sample at 96KHz, you just need to roll to near zero by 75KHz. Then you can digitally downsample/decimate to 44.1KHz or 48KHz.
Also note for CD audio, it's more like 24KHz where you get worried, not 28KHz.
I'm saying that if you oversample, it's easier to get appropriate rejection from your pre-sampling filter and it's easier to make it appropriately flat as well.
E.g. sample at 384KHz; you need to reject stuff over 360KHz. You probably have negligible energy up there to begin with. A 3rd-order filter with -3dB at 30KHz might get the job done. It's pretty easy to make this flat in phase and amplitude up to 20KHz, and things like capacitor nonlinearity are much less of a concern.
In turn, filtering down to 20KHz and rejecting from 22050 and up is easy in the digital domain. 512 taps gets me a filter flat to 0.15dB up to 20KHz and >63dB rejection over 22KHz.
My point was, this is a little better at 48KHz, because we can choose to e.g. pass 21KHz and have a wider guard band (14% vs 10%). With 384 taps, numbers are more like flat to 0.1dB and -67dB, benefitting both from the wider guard band and 48KHz being a factor of 384KHz.
You should pass all below 20KHz, as flat as possible. You definitely should stop 24.1KHz and up. How bad 22.05KHz to 24.1KHz is, is debatable.
> then surely you're high-end enough that you can't say “nah, who cares about the most common sample rate out there”.
I didn't say "don't support 44.1KHz" -- I'm saying there's good reasons to prefer 48KHz.
All being equal (same number of filter taps, etc)-- just a slightly higher sample rate offers a lot more performance because you can get a bit more frequency response and a lot flatter passband.