NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Google Removed 749M Anna's Archive URLs from Its Search Results (torrentfreak.com)
agluszak 10 hours ago [-]
Anna's archive has already fulfilled G's needs (training Gemini) so now it's time to pretend it never existed ;)
nine_k 7 hours ago [-]
Did Anna's Archive also organize much of the world's information and made it universally accessible, for some time?
seydor 58 minutes ago [-]
actually yes. and we re talking about high quality information, not random comments
GuinansEyebrows 6 hours ago [-]
They’re… yes. Yes, that’s exactly what they have done and continue to do. Are you familiar with it?
ternus 5 hours ago [-]
That phrase is Google's mission statement.
moffkalast 2 hours ago [-]
I thought their mission statement was "Don't be evil", until they shortened it for practicality to just "Be evil". It's certainly how they've been behaving in recent years.
netsharc 2 hours ago [-]
It's now "Don't be evil*"

* Subject to terms and conditions, lack of evil not be available in all regions.

satvikpendem 2 hours ago [-]
Motto, not mission statement.
snypher 6 hours ago [-]
I think the comment is saying Google was also doing that.
user_7832 4 hours ago [-]
Anna's archive doesn't engage in privacy-eroding antitrust/monopolistic activities (yet), so there's that I suppose...
dwattttt 4 hours ago [-]
They're doing one site less now
uvaursi 8 hours ago [-]
This should remain as the top comment.
arjie 3 hours ago [-]
It's not delisted. Anna's Archive is huge. The fact that Google participates in an entirely voluntary transparency log that gives you this information should illustrate to you where they stand on the issue of their needing to be compliant to the DMCA. It isn't clear to me why online communities constantly invent fan fiction of evil enemies when organizations merely comply with a reasonable interpretation of the law of the land they are incorporated in.
wiseowise 3 hours ago [-]
Apparently corpo doesn’t hesitate to remove it when it benefits consumer, because “we just follow the law, citizen!” But when it benefits corpo it takes decades of suing and multi-billion fines to make a change.

Totally not evil, just business, comrade, amirite?

pftburger 3 hours ago [-]
100% Here in Germany its invisible deleted, and the process handle by a private company
mptest 3 hours ago [-]
no one, and i mean no one, has to invent the history of evil corporations doing evil things. Climate change? Cigarettes?, shit let's go modern. CZ? SBF?

if it's not clear to you may i suggest with the upmost respect that you read surveillance capitalism by zuboff (a successor to manufactured consent in my humble opinion).

I guess my question is where do you get the confidence or belief these companies are doing anything BUT evil? how many of americas biggest companies' workers need food aid from the govt? look up what % of army grunts are food insecure. in the heart of empire.

Where on earth do you get this faith in companies from?

idiotsecant 1 hours ago [-]
Publicly traded corporations are machines whose only lawful purpose is to make money. They are legally obligated to be sociopathic systems. They aren't evil like an axe murderer, they're evil like a gasoline fire. They may be useful when properly controlled, but they're certainly never worth defending in the way you seem to feel the need to
deaux 1 hours ago [-]
>Publicly traded corporations are machines whose only lawful purpose is to make money.

Hey, so this isn't the case at all, publicly traded companies are under no lawful obligation to focus only on making money. Fiduciary duty does not mean this in any way. It's a common misconception whose perpetuation is harmful. Let's stop doing it.

someperson 10 hours ago [-]
Feels weird to say but I have found using Yandex of all places an excellent search engine for content that get taken down by DMCA requests.

Eg if you want to watch a movie that's not on Netflix using a web stream the search results are far better.

Feels like Google circa 2005.

chneu 9 hours ago [-]
I've been playing around with a variety of search engines such as Kagi, Startpage, Ecosia, DDG.

All of them are better than google in finding relevant results. Lol

Google is way too "personalized".

TranquilMarmot 4 hours ago [-]
I switched to Kagi a while back and ended up buying their annual subscription for unlimited searches. It's such a breath of fresh air, like a search engine from an alternate universe where Google just focused on search instead of adtech.
mtillman 8 hours ago [-]
Google hides the most relevant results on the 3rd page. It was confirmed in trial disclosures a few months ago. Their concern isn’t public search.
mtillman 3 hours ago [-]
Edit: after the 3rd page

Source: https://www.courtlistener.com/docket/18552824/1436/united-st...

For fun what Gemini says: “The notion that Google explicitly admitted to "deprioritizing good results to sell more ads" is a common interpretation of these documents and expert testimony.”

thaumasiotes 10 minutes ago [-]
> Source: https://www.courtlistener.com/docket/18552824/1436/united-st...

That's a 230-page pdf. Do you have a more specific citation?

smcin 5 hours ago [-]
Doesn't https://www.google.com/search?q=your+search+query&ei=...&sta... give you page 3? Or at least, try jittering it a bit and compare to frontpage results.

> &start= parameter. This parameter controls which result number the page starts with. Google displays 10 results per page by default. For page 1, start=0 For page 2, start=10 For page 3, start=20

superkuh 6 hours ago [-]
Google only ever returns a maximum of <400 results. If you actually click through at 100/page, you'll only get 3.something pages of results. Despite what is says at the top re: results. Those results are not accessible.

Bing only returns 900. Kagi only 200. Deep search and surfing is pretty much gone on all major search "engines".

locknitpicker 4 hours ago [-]
> Google only ever returns a maximum of <400 results.

That's perfectly fine. If I'm going to use a search engine, I'm not willing to sift through hundreds of potentially relevant results. I hope I find what I'm searching for in the first page, or at best in the first 3 pages or so.

What's not cool about Google is that now it hits you with AI slop with dubious quality right at the top, followed by a page of sponsored results, followed by some potentially useful results, followed by an entire ocean of spam traps and clone sites and really shady results with exotic never-seen-before TLDs that leaves you wondering whether clicking on a link will get in a hostile database. That's what's not cool about Google: is that you can't use it to search the web anymore.

nine_k 7 hours ago [-]
Seems to not be empirically true.
qiqitori 9 hours ago [-]
You can turn off personalization. (Operating under the assumption that most people search for facts, I personally don't see why one would ever want personalized results.)
p1necone 9 hours ago [-]
Location based personalization is pretty useful - if I search for 'Bob's Discount Linguine' I want the one in my neighborhood.

Lots of niche things (like programming) also reuse common english words to mean specific things - if I search e.g. 'locking' it's nice to get results related to asynchronous programming instead of locksmiths because google knows I regularly search for programming related terminology.

Of course it's questionable whether google does a good job at any of this, but I absolutely see the value.

edgineer 6 hours ago [-]
Personalization would be good if it meant recognizing that I dislike blogspam, SEO'd pages, advertisements, and assuming my location.
goku12 6 hours ago [-]
I often find myself searching for information that's not from my locality. This sort of 'location personalization' frustrate such efforts so much that I rarely 'google' these days. What's the point of having access to the internet if that access is going to be restricted like this without consent? If they want to make my search experience more relevant, they should provide me an option to limit my search, rather than callously assume my intentions.

It's much more egregious on the Android play store. Many apps like banking, transportation and online shopping apps are geolocked for installation, sometimes even without the developers' request or knowledge. What if I'm flying over there in two days, or just want to help someone who's already there? And even when I'm there, I have to prove my presence by supplying the local credit card details! Nothing else is enough - not GPS, not cell tower IDs, not the IP ranges or whatever else.

This is just outrageous because I can't even get a device that I paid for, to work for me. This is just sheer arrogance at this point - a wanton abuse of their co-monopoly privileges. However, I'm not under any delusions that they're here to improve my digital experience. These corporations profit by restricting their "users'" experience on an otherwise fully open internet.

skydhash 8 hours ago [-]
I just add another keyword to narrow the search result. I don’t think I’ve ever wanted results based on anything other than the query.
qiqitori 5 hours ago [-]
Search results are still location-specific even if you disable personalization.
throwaway-0001 6 hours ago [-]
Can you show me what results you see for “locking”? I see dancing move in all profiles I have.
smcin 5 hours ago [-]
Wow you're right. Locking dance moves and videos.

Weere you expecting to see padlocks or doorlocks or what?

throwaway-0001 4 hours ago [-]
I expected to be “personalized”. I’m definitely more into programming than dancing. I see 0 personalization tbh. And I tried a few different peoples phones.
smcin 1 hours ago [-]
Oh I see, locking in the programming sense, yes. Either not every search term is personalized for your context, or else this particular search is being applied to some other demographic. But that's weird because "locking" doesn't also show door, windows, filing cabinets.

Anyway if you search for "programming locking" you get relevant results.

Google didn't used to do this. Anyone got a rough idea when this started?

Ariarule 9 hours ago [-]
I won't bother defending Google-style personalization as it exists for their search results, but since collisions in terminology across fields are common, it's not that hard to see how actual, thoughtful personalization could be useful. Someone searching for "Kafka" is going to want very different results based on whether they're thinking of software or literature. Opinions may also differ over the usefulness of sources, even for people ultimately interested primarily in facts; I find Kagi-style personalization (make your own domain list) very useful, but across Kagi's userbase Reddit is simultaneously one of the most lowered, most raised, and most pinned domains: https://kagi.com/stats?stat=leaderboard
p1necone 9 hours ago [-]
Anecdotally I find myself appending 'reddit' to search terms very frequently. It's effectively shorthand for "I want to read about peoples direct experience with this thing", and reddit is huge and well crawled by search engines. It's astroturfed to hell especially around political topics, but I feel like it's easy to tell when discussions about random products are authentic.
dboreham 9 hours ago [-]
> Kafka" is going to want very different results based on whether they're thinking of software or literature.

Speak for yourself. I've worked in several "Kafka-esque" software organizations.

skulk 9 hours ago [-]
> I personally don't see why one would ever want personalized results.

The same short combination of words can mean very different things to different people. My favorite example of this is "C string" because when I was a kid learning C I was introduced to a whole new class of lingerie because Google didn't really personalize results back then. Now when I search "C string" Google knows exactly what I mean.

smcin 5 hours ago [-]
Some people search for shopping, or business details, in which case personalization can improve (or disimprove) result relevance based on knowing where you currently are, what day and time it is, what you tend to order etc. etc.

And some people search for songs/images/videos/books/articles.

egorfine 47 minutes ago [-]
As a Ukrainian I cannot feel anything but hatred towards the propaganda machine Yandex has become.

As an engineer I cannot feel anything but respect to the multi-decade research legacy of the company and their incredible search engine.

dzonga 8 hours ago [-]
yep Yandex all days when I wanna wear an eye patch and pirate the seas.
smcin 5 hours ago [-]
Hmm, Yandex Ad Network is allowed monetize western e-commerce sites, they divested their Russian assets by 2024.
negativelambda 10 hours ago [-]
I just tested, indeed very good results!
bad_username 7 hours ago [-]
[flagged]
someperson 6 hours ago [-]
For what it's worth, this is my first pro-Yandex comment after 17 years on Hacker News.

It's a major tech company service based in Russia, so presumably controlled by the government of Russia.

But the results produces for a query like "watch (obscure movie) online stream" are far better than what Google or Bing produces. If you need to check a scene of a specific episode of an obscure TV show, it's the fastest method (but happy to hear alternatives).

Also, the websites it links to aren't operated by the government of Russia.

devsda 5 hours ago [-]
Where I am, both yandex and Google are services from a foreign land.

I can't say about Yandex because I haven't used it much, but I have used Google and its services enough to know that it may appear neutral but its services do reflect politics of its origin country. For an outsider, I doubt Yandex is going to be any different than Google in those matters.

noosphr 6 hours ago [-]
Genuine question: what can go wrong?
t-3 4 hours ago [-]
The damn commies will destroy our film industry and blackmail pirates into revealing classified information! Or maybe nothing.
socrateswasone 7 hours ago [-]
Oooh scary, watch out for the Russian Boogeyman!
ForgetItJake 5 hours ago [-]
> Ah yes, using a Russian service, what could go wrong.

Nothing if you know what you're doing.

> Weekly Yandex astroturfers strike again.

People doing things you don't like doesn't mean they don't exist.

jimjimwii 1 hours ago [-]
I am not exaggerating when i say i completely stopped using google for searches that google might take offence to. Serial numbers, business phone numbers, and of course books and papers all ho through real search engines. Currently, those are yandex as my main goto with brave as a backup.

I couldn't care less what google does because i don't use it.

aunty_helen 11 hours ago [-]
Google does search now? I mean, it's great to see but I'm not sure how this is going to challenge the convenience of my chosen brand of chatbot being able to find the same info without being scammed by 100 seo optimised junk sites.
n1xis10t 11 hours ago [-]
I have heard that chatbots aren’t affected by spam as much as Google when you ask them to search, is that true?
aunty_helen 5 hours ago [-]
As much, yet. There’s still time and the OpenAI roadmap seems to promise ‘26 as the year.
JKCalhoun 11 hours ago [-]
Not sure. I understand they used to do search though.

(Love the username, BTW.)

n1xis10t 11 hours ago [-]
Yeah they’re pretty terrible now. Reminds me, this is an interesting article about search engines getting worse and failing, but the author didn’t get into the spam aspect iirc: https://archive.org/details/search-timeline
zzo38computer 6 hours ago [-]
Is there a good search engine which does not execute any JavaScripts on files that it scans? (This is not the same as excluding web pages that use JavaScripts (I have seen some search engines that do this); I still want to be able to search for them, but I do not want the search queries (or the summaries of the results) to include anything that is only displayed due to JavaScripts.)
pessimizer 9 hours ago [-]
No matter what my chosen brand of chatbot is, it can't help but hallucinate between 25% and 90% of the links it offers me. If it's not it's just proxying a google search for you itself.
user_7832 3 hours ago [-]
That honestly sounds like you're using your bot (accidentally) in offline mode. Try a simple search on perplexity first and see if you get valid links, then try chatgpt/ai studio with internet search on.
WheatMillington 8 hours ago [-]
Weird, I get pretty great results. Maybe I had hallucination rates like that 2 years ago, but not today.
DANmode 7 hours ago [-]
Browser based iOS usage of ChatGPT, by chance?
throwaway-0001 6 hours ago [-]
Which model you using exactly?
add-sub-mul-div 11 hours ago [-]
1. Your chatbot doesn't have its own internet scale search index.

2. You're being given information that may or may not be coming in part from junk sites. All you've done is give up the agency to look at sources and decide for yourself which ones are legitimate.

n1xis10t 11 hours ago [-]
As for point one, is that true? I thought ChatGPT and Perplexity had their own indexes.
aunty_helen 5 hours ago [-]
I’m quite happy trading off the agency of wading through trash to an LLM. In fact, I would say that’s something they’re pretty good at.
what 4 hours ago [-]
It’s just regurgitating the same trash to you though.
10 hours ago [-]
nullbyte808 7 hours ago [-]
Man I need to get around to downloading the z-archive torrents before annas archive is taken down. If I eliminate large PDFs and non english books I think I can fit it on two 32 TB drives with BTRFS z-std compression max setting. https://annas-archive.org/torrents
mmooss 4 hours ago [-]
> eliminate large PDFs

How large? Isn't that going to result in an arbitrary filter of books? In other domains, large PDFs are due to PDF production errors, such as using color or needlessly high resolution, and not so much due to the volume of content - at least for text.

cookiengineer 4 hours ago [-]
Let me know of those efforts, I wanna have an English/German/French backup of the archive, too. But as you said HDDs and filesystems are the problem, really.

Maybe I'll have to build a torrent splitter or something, because the UIs of all torrent clients are just not built for that.

nullbyte808 7 hours ago [-]
ggm 11 hours ago [-]
I'm not sure I've ever relied on google to tell me what a site like this had, when the site itself is fully indexed, as this one is. Freetext search over the metastate of title, author, format, date (when available) -seems to work.
npteljes 3 minutes ago [-]
Web searches like Google are great when searching for not exact terms, like synonyms for example. I have never encountered a website that has a search capability like that. Google finds the song "Million voices" by Otto Knows, from the search query "a a a a ah ah ah ah dance song".
n1xis10t 11 hours ago [-]
They don’t have full text search of document contents though do they? I know Google wouldn’t have this for AA pages either, just curious
ggm 11 hours ago [-]
Good point. So there is definitely a social utility in search over text which google does have, for the trove it scanned, hands and cats-pawprints and all.
n1xis10t 11 hours ago [-]
I’m pretty sure Google indexing pages from Anna’s archive would only get metadata, because AA doesn’t have the full text of the books on those pages. I think to get the full text you have to download the torrents, and I don’t think Google was doing that.
ggm 10 hours ago [-]
No, thats more meta's trick. and they were "only doing it for the articles" not the pictures. I think. I dunno..
bigiain 9 hours ago [-]
They were doing it for the videos too, but only for "personal use"...

https://www.wired.com/story/meta-claims-downloaded-porn-at-c...

Razengan 6 minutes ago [-]
Wait so did Gemini train on Wikipedia etc.?

Isn't it a conflict of interest or something if their AI results prevent people from clicking on the websites Google's AI trained on?

storus 10 hours ago [-]
Google's march to irrelevance continues with full steam.
DaSHacka 9 hours ago [-]
They got a long way ahead of them then, considering they're still something like 97% of all search queries.
esafak 8 hours ago [-]
Actually ~90%, but that does not include AI search (chatgpt et al).

https://www.klatch.co.uk/search-engine-market-share

drnick1 10 hours ago [-]
Go thing that Google hasn't been a part of my life for a while now. I use DuckDuck for search.
aucisson_masque 1 hours ago [-]
Duckduckgo is bing, bing is Microsoft. I don't see how Microsoft is better than google at censorship.
NooneAtAll3 8 hours ago [-]
I've seen DDG censor stuff that was still on google
ilt 9 hours ago [-]
And still it’s the top result in Google if one searches for Anna’s archive. How is it that that search result hasn’t been removed?
incompatible 8 hours ago [-]
Presumably, the home page doesn't contain any copyright violations. This is only DMCA stuff targetting individual links.
tonyhart7 1 hours ago [-]
well if publisher DMCA request to google then I don't know why people get mad about

its still piracy at the end of the day and publisher have right to license etc, people mad about this maybe dont have to deal this as a business

musicale 8 hours ago [-]
Google has already removed URLs from the first page of "search" results.
pessimizer 9 hours ago [-]
I was surprised that those pages showed up in book title searches at all. Makes sense to get rid of them, you don't want a search for a book to be topped by a link to pirate the book. The top-level domains still come up, and people who know they want to pirate a book can still find the site.
chris_wot 9 hours ago [-]
Google search keeps getting less useful every day.
8 hours ago [-]
0xedd 5 hours ago [-]
[dead]
toomuchtodo 11 hours ago [-]
Are they in ChatGPT and other LLM providers? No need for Google.
mmooss 4 hours ago [-]
That's a good question: When LLM providers receive DMCA takedowns, how easily can they implement them? Use a post-LLM filter?
toomuchtodo 4 hours ago [-]
I was more suggesting that I want my LLM provider to launder the IP so it avoids copyright law. The LLM provider is a fancy search engine where copyright does not apply to the results.
mmooss 4 hours ago [-]
Do LLMs filter piracy requests? For example, how will it respond to 'find me a free copy of the Lord of the Rings movies' or more explicitly 'find me a pirated copy ...'?
10 hours ago [-]
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 10:19:54 GMT+0000 (Coordinated Universal Time) with Vercel.