NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Why the deep learning boom caught almost everyone by surprise (understandingai.org)
aithrowawaycomm 13 minutes ago [-]
I think there is a slight disconnect here between making AI systems which are smart and AI systems which are useful. It’s a very old fallacy in AI: pretending tools which assist human intelligence by solving human problems must themselves be intelligent.

The utility of big datasets was indeed surprising, but that skepticism came about from recognizing the scaling paradigm must be a dead end: vertebrates across the board require less data to learn new things, by several orders of magnitude. Methods to give ANNs “common sense” are essentially identical to the old LISP expert systems: hard-wiring the answers to specific common-sense questions in either code or training data, even though fish and lizards can rapidly make common-sense deductions about manmade objects they couldn’t have possibly seen in their evolutionary histories. Even spiders have generalization abilities seemingly absent in transformers: they spin webs inside human homes with unnatural geometry.

Again it is surprising that the ImageNet stuff worked as well as it did. Deep learning is undoubtedly a useful way to build applications, just like Lisp was. But I think we are about as close to AGI as we were in the 80s, since we have made zero progress on common sense: in the 80s we knew Big Data can poorly emulate common sense, and that’s where we’re at today.

DeathArrow 3 hours ago [-]
I think neural nets are just a subset of machine learning techniques.

I wonder what would have happened if we poured the same amount of money, talent and hardware into SVMs, random forests, KNN, etc.

I don't say that transformers, LLMs, deep learning and other great things that happened in the neural network space aren't very valuable, because they are.

But I think in the future we should also study other options which might be better suited than neural networks for some classes of problems.

Can a very large and expensive LLM do sentiment analysis or classification? Yes, it can. But so can simple SVMs and KNN and sometimes even better.

I saw some YouTube coders doing calls to OpenAI's o1 model for some very simple classification tasks. That isn't the best tool for the job.

jasode 1 hours ago [-]
>I wonder what would have happened if we poured the same amount of money, talent and hardware into SVMs, random forests, KNN, etc.

But that's backwards from how new techniques and progress is made. What actually happens is somebody (maybe a student at a university) has an insight or new idea for an algorithm that's near $0 cost to implement a proof-of concept. Then everybody else notices the improvement and then extra millions/billions get directed toward it.

New ideas -- that didn't cost much at the start -- ATTRACT the follow on billions in investments.

This timeline of tech progress in computer science is the opposite from other disciplines such as materials science or bio-medical fields. Trying to discover the next super-alloy or cancer drug all requires expensive experiments. Manipulating atoms & molecules requires very expensive specialized equipment. In contrast, computer science experiments can be cheap. You just need a clever insight.

An example of that was the 2012 AlexNet image recognition algorithm that blew all the other approaches out of the water. Alex Krizhevsky had an new insight on a convolutional neural network to run on CUDA. He bought 2 NVIDIA cards (GTX580 3GB GPU) from Amazon. It didn't require NASA levels of investment at the start to implement his idea. Once everybody else noticed his superior results, the billions began pouring in to iterate/refine on CNNs.

Both the "attention mechanism" and the refinement of "transformer architecture" were also cheap to prove out at a very small scale. In 2014, Jakob Uszkoreit thought about an "attention mechanism" instead of RNN and LSTM for machine translation. It didn't cost billions to come up with that idea. Yes, ChatGPT-the-product cost billions but the "attention mechanism algorithm" did not.

>into SVMs, random forests, KNN, etc.

If anyone has found an unknown insight into SVM, KNN, etc that everybody else in the industry has overlooked, they can do cheap experiments to prove it. E.g. The entire Wikipedia text download is currently only ~25GB. Run the new SVM classification idea on that corpus. Very low cost experiments in computer science algorithms can still be done in the proverbial "home garage".

dr_dshiv 8 minutes ago [-]
The best tool for the job is, I’d argue, the one that does the job most reliably for the least amount of money. When you consider how little expertise or data you need to use openai offerings, I’d be surprised if sentiment analysis using classical ML methods are actually better (unless you are an expert and have a good dataset).
empiko 2 hours ago [-]
Deep learning is easy to adapt to various domains, use cases, training criteria. Other approaches do not have the flexibility of combining arbitrary layers and subnetworks and then training them with arbitrary loss functions. The depth in deep learning is also pretty important, as it allows the model to create hierarchical representations of the inputs.
f1shy 23 minutes ago [-]
But is very hard to validate for important or critical applications
edude03 51 minutes ago [-]
Transformers were made for machine translation - someone had the insight that when going from one language to another the context mattered such that the tokens that came before would bias which ones came after. It just so happened that transformers we more performant on other tasks, and at the time you could demonstrate the improvement on a small scale.
f1shy 27 minutes ago [-]
> neural nets are just a subset of machine learning techniques.

Fact by definition

mentalgear 2 hours ago [-]
KANs (Kolmogorov-Arnold Networks) are one example of a promising exploration pathway to real AGI, with the advantage of full explain-ability.
astrange 2 hours ago [-]
"Explainable" is a strong word.

As a simple example, if you ask a question and part of the answer is directly quoted from a book from memory, that text is not computed/reasoned by the AI and so doesn't have an "explanation".

But I also suspect that any AGI would necessarily produce answers it can't explain. That's called intuition.

diffeomorphism 1 hours ago [-]
Why? If I ask you what the height of the Empire State Building is, then a reference is a great, explainable answer.
Meloniko 2 hours ago [-]
And based on what though do you think that?

I think neural networks are fundamental and we will focus/experiment a lot more with architecture, layers and other parts involved but emerging features arise through size

trhway 2 hours ago [-]
>I wonder what would have happened if we poured the same amount of money, talent and hardware into SVMs, random forests, KNN, etc.

people did that to horses. No car resulted from it, just slightly better horses.

>I saw some YouTube coders doing calls to OpenAI's o1 model for some very simple classification tasks. That isn't the best tool for the job.

This "not best tool" is just there for the coders to call while the "simple SVMs and KNN" would require coding and training by those coders for the specific task they have at hand.

guappa 1 hours ago [-]
[citation needed]
ldjkfkdsjnv 35 minutes ago [-]
This is such a terrible opinion, im so tired of reading the LLM deniers
2sk21 3 hours ago [-]
I'm surprised that the article doesn't mention that one of the key factors that enabled deep learning was the use of RELU as the activation function in the early 2010s. RELU behaves a lot better than the logistic sigmoid that we used until then.
sanxiyn 2 hours ago [-]
Geoffrey Hinton (now a Nobel Prize winner!) himself did a summary. I think it is the single best summary on this topic.

  Our labeled datasets were thousands of times too small.
  Our computers were millions of times too slow.
  We initialized the weights in a stupid way.
  We used the wrong type of non-linearity.
imjonse 2 hours ago [-]
That is a pithier formulation of the widely accepted summary of "more data + more compute + algo improvements"
sanxiyn 2 hours ago [-]
No, it isn't. It emphasizes importance of Glorot initialization and ReLU.
macrolime 1 hours ago [-]
I took some AI courses around the same time as the author, and I remember the professors were actually big proponents of neural nets, but they believed the missing piece was some new genius learning algorithm rather than just scaling up with more data.
rramadass 36 minutes ago [-]
> rather than just scaling up with more data.

That was the key takeaway for me from this article. I didn't know of Fei-Fei Li's ImageNet contribution which actually gave all the other researchers the essential data to train with. Her intuition that more data would probably make the accuracy of existing algorithms better i think is very much under appreciated.

Key excerpt;

So when she got to Princeton, Li decided to go much bigger. She became obsessed with an estimate by vision scientist Irving Biederman that the average person recognizes roughly 30,000 different kinds of objects. Li started to wonder if it would be possible to build a truly comprehensive image dataset—one that included every kind of object people commonly encounter in the physical world.

teknover 5 hours ago [-]
“Nvidia invented the GPU in 1999” wrong on many fronts.

Arguably the November 1996 launch of 3dfx kickstarted GPU interest and OpenGL.

After reading that, it’s hard to take author seriously on the rest of the claims.

Someone 4 hours ago [-]
I wound not call it invent”, but it seems Nvidia defined the term GPU. See https://www.britannica.com/technology/graphics-processing-un... and https://en.wikipedia.org/wiki/GeForce_256#Architecture:

“GeForce 256 was marketed as "the world's first 'GPU', or Graphics Processing Unit", a term Nvidia defined at the time as "a single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines that is capable of processing a minimum of 10 million polygons per second"”

They may have been the first with a product that fitted that definition to market.

kragen 30 minutes ago [-]
That sounds like marketing wank.

I don't think you can get a speedup by running neural networks on the GeForce 256, and the features listed there aren't really relevant (or arguably even present) in today's GPUs. As I recall, people were trying to figure out how to use GPUs to get faster processing in their Beowulfs in the late 90s and early 21st century, but it wasn't until about 02005 that anyone could actually get a speedup. The PlayStation 3's "Cell" was a little more flexible.

ahofmann 5 hours ago [-]
Wow, that is harsh. The quoted claim is in the middle of a very long article. The background of the author seems to be more on the scientific side, than the technical side. So throw out everything, because the author got one (not very important) date wrong?
RicoElectrico 3 hours ago [-]
Revisionist marketing should not be given a free pass.
twelve40 2 hours ago [-]
yet it's almost the norm these days. Sick of hearing Steve Jobs invented smartphones when I personally was using a device with web and streaming music years before that.
kragen 28 minutes ago [-]
You don't remember when Bill Gates and AOL invented the internet, Apple invented the GUI, and Tim Berners-Lee invented hypertext?
3 hours ago [-]
3 hours ago [-]
kragen 36 minutes ago [-]
Arguably the November 01981 launch of Silicon Graphics kickstarted GPU interest and OpenGL. You can read Jim Clark's 01982 paper about the Geometry Engine in https://web.archive.org/web/20170513193926/http://excelsior..... His first key point in the paper was that the chip had a "general instruction set", although what he meant by it was quite different from today's GPUs. IRIS GL started morphing into OpenGL in 01992, and certainly when I went to SIGGRAPH 93 it was full of hardware-accelerated 3-D drawn with OpenGL on Silicon Graphics Hardware. But graphics coprocessors date back to the 60s; Evans & Sutherland was founded in 01968.

I mean, I certainly don't think NVIDIA invented the GPU—that's a clear error in an otherwise pretty decent article—but it was a pretty gradual process.

rramadass 4 hours ago [-]
After actually having read the article i can say that your comment is unnecessarily negative and clueless.

The article is a very good historical one showing how 3 important things came together to make the current progress possible viz;

1) Geoffrey Hinton's back-propagation algorithm for deep neural networks

2) Nvidia's GPU hardware used via CUDA for AI/ML and

3) Fei-Fei Li's huge ImageNet database to train the algorithm on the hardware. This team actually used "Amazon Mechanical Turk"(AMT) to label the massive dataset of 14 million images.

Excerpts;

“Pre-ImageNet, people did not believe in data,” Li said in a September interview at the Computer History Museum. “Everyone was working on completely different paradigms in AI with a tiny bit of data.”

“That moment was pretty symbolic to the world of AI because three fundamental elements of modern AI converged for the first time,” Li said in a September interview at the Computer History Museum. “The first element was neural networks. The second element was big data, using ImageNet. And the third element was GPU computing.”

santoshalper 4 hours ago [-]
Possibly technically correct, but utterly irrelevant. The 3dfx chips accelerated parts of the 3d graphics pipeline and were not general-purpose programmable computers the way a modern GPU is (and thus would be useless for deep learning or any other kind of AI).

If you are going to count 3dfx as a proper GPU and not just a geometry and lighting accelerator, then you might as well go back further and count things like the SGI Reality Engine. Either way, 3dfx wasn't really first to anything meaningful.

FeepingCreature 2 hours ago [-]
But the first NVidia GPUs didn't have general-purpose compute either. Google informs me that the first GPU with user-programmable shaders was the GeForce 3 in 2001.
KevinMS 3 hours ago [-]
Can confirm. I was playing Unreal on my dual Voodoo2 SLI rig back in 1998.
vdvsvwvwvwvwv 4 hours ago [-]
Lesson: ignore detractors. Especially if their argument is "dont be a tall poppy"
psd1 4 hours ago [-]
Also: look for fields that have stagnated, where progress is enabled by apparently-unrelated innovations elsewhere
xanderlewis 4 hours ago [-]
Unfortunately, they’re usually right. We just don’t hear about all the time wasted.
blitzar 3 hours ago [-]
On several occasions I have heard "they said it couldn't be done" - only to discover that yes it is technically correct, however, "they" was on one random person who had no clue and anyone with any domain knowledge said it was reasonable.
friendzis 3 hours ago [-]
Usually when I hear "they said it couldn't be done", it is used as triumphant downplay of legitimate critique. If you dig deeper that "couldn't be done" usually is in relation to some constraints or performance characteristics, which the "done" thing still does not meet, but the goalposts have already been moved.
vdvsvwvwvwvwv 1 hours ago [-]
What if the time wasted is part of the search? The hive wins but a bee may not. (Capitalism means some bees win too)
jakeNaround 4 hours ago [-]
[dead]
madaxe_again 4 hours ago [-]
I can’t be the only one who has watched this all unfold with a sense of inevitability, surely.

When the first serious CUDA based ML demos started appearing a decade or so ago, it was, at least to me, pretty clear that this would lead to AGI in 10-15 years - and here we are. It was the same sort of feeling as when I first saw the WWW aged 11, and knew that this was going to eat the world - and here we are.

The thing that flummoxes me is that now that we are so obviously on this self-reinforcing cycle, how many are still insistent that AI will amount to nothing.

I am reminded of how the internet was just a fad - although this is going to have an even greater impact on how we live, and our economies.

xen0 4 hours ago [-]
What makes you think AGI is either here or imminent?

For me the current systems still clearly fall short of that goal.

madaxe_again 2 hours ago [-]
They do fall short, but progress in this field is not linear. This is the bit that I struggle to comprehend - that which was literally infeasible only a few years ago is now mocked and derided.

It’s like jet engines and cheap intercontinental travel becoming an inevitability once the rubicon of powered flight is crossed - and everyone bitching about the peanuts while they cruise at inconceivable speed through the atmosphere.

diffeomorphism 1 hours ago [-]
Just like supersonic travel between Europe and America becoming common place was inevitable. Oh, wait.

Optimism is good, blind optimism isn't.

oersted 3 hours ago [-]
What do you think is next?
madaxe_again 2 hours ago [-]
An unravelling, as myriad possibilities become actualities. The advances in innumerate fields that ML will unlock will have enormous impacts.

Again, I cannot understand for the life of me how people cannot see this.

alexander2002 6 minutes ago [-]
I had a hypothesis once and It is probably 1000% wrong. But I will state here. /// Once computers can talk to other computers over network in human friendly way <abstraction by llm> and such that these entities completely control our interfaces which we humans can easily do and use them effectively multi-modality then I think there is a slight chance "I" might belive there is AGI or atleast some indications of it
BriggyDwiggs42 4 hours ago [-]
Downvoters are responding to a perceived arrogance. What does agi mean to you?
nineteen999 3 hours ago [-]
Could be arrogance, or could be the delusion.
BriggyDwiggs42 28 minutes ago [-]
Indeed, it sure could be arrogance.
madaxe_again 2 hours ago [-]
Why is it a delusion, in your opinion?
andai 1 hours ago [-]
It's a delusion on the part of the downvoters.
arcmechanica 7 hours ago [-]
It was basically useful to average people and wasn't just some way to steal and resell data or dump ad after ad on us. A lot of dark patterns really ruin services.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 12:13:51 GMT+0000 (Coordinated Universal Time) with Vercel.