This sucks. Cache basically guaranteed that whatever Google thought was on the page could actually be found.
These days Google will offer a result where the little blurb (which is actually a truncated mini cache) shows a part of the information I'm looking for, but the page itself does not. Removing cache means the data is just within reach, but you can't get to it anymore.
Internet Archive is a hassle and not as reliable as an actual copy of the page where the blurb was extracted from.
jfoster 122 days ago [-]
Google used to have a policy of sites being required to show Google the same page that they serve users. It seems that has been eroded.
I'm not sure how that serves Google's interests, except perhaps that it keeps them out of legal hot water vs the news industry?
Ozzie_osman 122 days ago [-]
It's called cloaking and it's still looked down upon from an SEO perspective. That said, there's a huge gray area. Purposefully cloaking a result to trick a search engine would get penalized. "Updating" a page with newer content periodically is harder to assess.
lelandfe 122 days ago [-]
There's also "dynamic rendering," in which you serve Google/crawlers "similar" content to, in theory, avoid JS-related SEO issues. However, it can just be a way to do what the parent commenter dislikes: render a mini-blurb unfound on the actual page.
Shoot, even a meta description qualifies for that - thankfully Google uses them less and less.
frde_me 122 days ago [-]
Google will reliably index dynamic sites rendered using JS. And other search engines do the same. There's really no good reason to do this if you want to be indexed on search engines.
lelandfe 122 days ago [-]
Agreed. Yet whether it should be done is different than whether it is done. Google was recommending it in 2018, and degraded it to a "workaround" just two years ago. Sites still do it, SaaS products still tout its benefits, and Google does not penalize sites for it. GP's gripe about SERP blurbs being missing is still very much extant and blessed by Google.
izacus 122 days ago [-]
People on HN have repeatedly stated that Google is "stealing their content" from their websites so it seems like this is a natural extension to that widespread opinion.
Isn't this the web we want? One where big corporations don't steal our websites? Right?
dspillett 122 days ago [-]
> People on HN have repeatedly stated
It is definitely an area where there is no single "people of HN" - opinion varies widely and is often more nuanced than a binary for/against⁰ matter. From an end user PoV it is (was) a very useful feature, one that kept me using Google by default¹, and I think that many like me used it as a backup when content was down at source.
The key problem with giving access to cached copies like this is when it effectively becomes the default view, holding users in the search providers garden instead of the content provider being particularly acknowledged never mind visited and the search service making money from that through adverts and related stalking.
I have sympathy for honest sites when their content is used this way, though those that give search engines full text but paywall most of it when I look who complain about the search engine showing fuller text, can do one. Also those who do the "turn off your stalker blocker or well not show you anything" thing.
----
[0] Or ternary for/indifferent/against one.
[1] I'm now finally moving away, currently experimenting with Kagi, as a number of little things that kept me there are no longer true and more and more irritations² keep appearing.
[2] Like most of the first screen full of a result being adverts and an AI summary that I don't want, just give me the relevant links please…
immibis 122 days ago [-]
Cached was only a fallback option when the original site was broken. When the original site works, nearly everyone clicks on it.
meiraleal 122 days ago [-]
People on HN will always find a way to see a good side from Google's terrible product and engineering
izacus 122 days ago [-]
Which is great that there's one less of their products, it's terrible anyway.
ysofunny 122 days ago [-]
ownership over websites does not work the way people expect nor want
I'm tired of saying this, yelling at clouds
Retr0id 122 days ago [-]
I most commonly run into this issue when the search keyword was found in some dynamically retrieved non-body text - maybe it was on a site's "most recent comments" panel or something.
luckylion 122 days ago [-]
That policy was never actually enforced that way, however. They'd go after you if you had entirely different content for google vs for users, but large scientific publishers already had "full pdf to google, HTML abstract + pay wall to users" 20 years ago and it was never an issue.
It makes some sense, too because the edges are blurry. If a user from France receives a french version on the same URL where a US-user would receive an english version, is that already different content? What if (as it usually happens), one language gets prioritized and the other only receives updates once in a while?
And while Google recommends to treat them like you'd treat any other user when it comes to e.g. geo-targeting, in reality that's not possible if you do anything that requires compliance and isn't available in California. They do Smartphone and Desktop-crawling, but they don't do any state- or even country-level crawling. Which is understandable as well, few sites really need to or want to do that, and it would require _a lot_ more crawling (e.g. in the US you'd need to hit each URL once per state), and there's no protocol to indicate it (and there probably won't be one because it's too rare).
kaoD 122 days ago [-]
> It makes some sense, too because the edges are blurry. If a user from France receives a french version on the same URL where a US-user would receive an english version, is that already different content?
The recommended (or rather correct) way to do this is to have multiple language-scoped URLs, be it a path fragment or entirely different (sub)domains. Then you cross-link each other with <link> tags with rel="alternate" and hreflang (for SEO purposes) and give the user some affordance to switch between them (only if they want to do so).
Public URLs should never show different content depending on anything else than the URL and current server state. If you really need to do this, 302 Redirect into a different URL.
But really, don't do that.
If the URL is language-qualified but it doesn't match whatever language/region you guessed for the user (which might very well be wrong and/or conflicting, e.g. my language and IP's country don't match, people travel, etc.) just let the user know they can switch URLs manually if they want to do so.
You're just going to annoy me if you redirect me away to a language I don't want just because you tried being too smart.
luckylion 122 days ago [-]
> Public URLs should never show different content depending on anything else than the URL and current server state.
As a real-world example: you're providing some service that is regulated differently in multiple US-states. Set up /ca/, /ny/ etc and let them be indexed and you'll have plenty of duplicate content and all sorts of trouble that comes with it. Instead you'll geofence like everyone else (including Google's SERPs) and a single URL now has content that depends on the perceived IP location because both SEO and legal will be happy with that solution, and neither will be entirely happy with the state-based urls.
jfoster 121 days ago [-]
> You're just going to annoy me if you redirect me away to a language I don't want just because you tried being too smart.
So what do you propose that such a site shows on their root URL? It's possible to pick a default language (eg. English), but that's not a very good experience when the browser has already told you that they prefer a different language, right? It's possible to show a language picker, but that's not a very good experience for all users, then, as their browser has already told you which language they prefer.
> This header serves as a hint when the server cannot determine the target content language otherwise (for example, use a specific URL that depends on an explicit user decision). The server should never override an explicit user language choice. The content of Accept-Language is often out of a user's control (when traveling, for instance). A user may also want to visit a page in a language different from the user interface language.
So basically: don't try to be too smart. I'm more often than not bitten by this as someone whose browser is configured in English but often would like to visit their native language. My government's websites do this and it's infuriating, often showing me broken English webpages.
The only acceptable use would be if you have a canonical language-less URL that you might want to redirect to the language-scoped URL (e.g. visiting www.example.com and redirecting to example.com/en or example.com/fr) while still allowing the user to manually choose what language to land in.
If I arrive through Google with English search terms, believe it or not, I don't want to visit your French page unless I explicitly choose to do so. Same when I send some English webpage to my French colleague. This often happens with documentation sites and it's terrible UX.
120 days ago [-]
immibis 121 days ago [-]
"Accept" specifies a MIME type preference.
kaoD 121 days ago [-]
You said accept headerS and since the thread was about localization I assumed you meant Accept-Language.
To answer your comment: yes you should return the same content (resource) from that URL (note the R in URL). If you want/can, you can attend to the Accept header to return it in other representation, but the content should be the same.
So /posts should return the same list of posts whether in HTML, JSON or XML representation.
But in practice content negotiation isn't used that often and people just scope APIs in their own subpath (e.g. /posts and /api/posts) since it doesn't matter that much for SEO (since Google mostly cares about crawling HTML, JSON is not going to be counted as duplicate content).
immibis 120 days ago [-]
Why are XML and JSON alternates of the same resource but French and German are two different resources?
kaoD 120 days ago [-]
Because the world is imperfect and having URLs instead of using content negotiation makes a far better user and SEO experience so that's what we do in practice.
IOW pragmatism.
immibis 120 days ago [-]
Giving the content in the user agent's configured language preference also seems pragmatic.
> If your site has locale-adaptive pages (that is, your site returns different content based on the perceived country or preferred language of the visitor), Google might not crawl, index, or rank all your content for different locales. This is because the default IP addresses of the Googlebot crawler appear to be based in the USA. In addition, the crawler sends HTTP requests without setting Accept-Language in the request header.
> Important: We recommend using separate locale URL configurations and annotating them with rel="alternate" hreflang annotations.
mtkd 122 days ago [-]
Content delivery is becoming so dynamic and targeted, there is no way that can work effectively now -- even for first impression as one or more MVTs may be in place
amy-petrik-214 122 days ago [-]
>Internet Archive is a hassle and not as reliable
The paradox of the internet archive's existence is if all that data were easily searchable and integrated (i.e. if people really used it) they would not exist by way of no more money for bandwidth and by way of lawsuit hell. So they exist to share archived data, but if they share archived data they would not exist.
and so it is a wonderful magical resource, absolutely, but your "power user" level as well as "free time level" has to be such that you build your own internet archive search engine and google cache plugin alternative... and not share it with anyone for the above existential reasons
avar 122 days ago [-]
I don't have a current example of this, but I'd just like to add that they also do this for images.
E.g. images deleted from Reddit can sometimes be seen in Google image results, but when following the links they might lead to deleted posts.
dspillett 122 days ago [-]
That is because the image and the post are treat as separate entities, with the post providing metadata for the image so the search engine will find it when you query. Even from Reddit's sure, if the post is gone the image may remain so you could be being served that by Reddit (or third party) rather than from Google's cache (any thumbnail will come from Google of course).
Retr0id 122 days ago [-]
I've been puzzled by this move for a long time, since they first announced it. None of their provided reasoning justifies removal of such a useful feature.
The simplest answer could be that making the cache accessible costs them money and now they're tightening their purse strings. But maybe it's something else...
For sites that manipulate search rankings by showing a non-paywalled article to Google's search bot, while serving paywalled articles to regular users, the cache acts as a paywall bypass. Perhaps Google was taking heat for this, and/or they're pre-emptively reducing their legal liabilities?
Now IA gets to take that heat instead...
dageshi 122 days ago [-]
I assume it's to stop people using the cached copies as source material for LLM's.
The cache is arguably a strategic resource for google now.
Retr0id 122 days ago [-]
If it is a scraping thing, I'd rather they added captchas than took the feature away entirely. I know captchas can be bypassed, but so can paywalls.
amelius 122 days ago [-]
I always wondered if it was legal for Google to store those cached pages.
lysace 122 days ago [-]
The killing presumably has to something do with the legal costs of maintaining this service.
compiler1410 122 days ago [-]
[flagged]
geuis 122 days ago [-]
This breaks my heart a bit. My first browser extension Cacheout was around 2005. Back in the days of sites getting hugged to death from Slashdot. The extension gave right context menu options to try loading a cached version of the dead site. Tried Google cache first, then an another cdn caching service I can't remember, and finally waybackmachine. Extension even got included in a cd packaged with MacWorld magazine at one point.
This has always been one of Google's best features. Really sad they killed it.
giantrobot 122 days ago [-]
> Tried Google cache first, then an another cdn caching service I can't remember, and finally waybackmachine.
Coral Cache maybe? The caches you listed were my manual order to check when a link was Slashdotted.
Google's cache, at least in the early days, was super useful in the cache link for a search result highlighted your search terms. It was often more helpful to hit the cached link than the actual link since 1) it was more likely to be available and 2) had your search terms readily apparent.
terinjokes 122 days ago [-]
I remember CacheFly being popular on Digg for a while to read sites that got hugged.
geuis 120 days ago [-]
Coral Cache! Thanks. Totally forgot that one.
Nyr 122 days ago [-]
I am surprised that no one has mentioned the most obvious alternative: Bing Cache.
It is not as complete as Google's, but it is usually good enough.
stuffoverflow 122 days ago [-]
Yandex also has a pretty extensive cache, although recently they seem to have disabled caching for reddit. Otherwise it is good for finding deleted stuff, I've seen cached pages go as far back as a couple of years for some smaller/deleted websites.
earslap 121 days ago [-]
I only ever used cache to find what google thought was in the site (at the time of crawling) as these days it is common to not find that info in the updated page. For everything else, there is the Internet Archive.
relaxing 122 days ago [-]
Thanks! I never go to Bing but I probably will now.
accrual 122 days ago [-]
I recall the links disappearing quite a while ago. It's a bummer because cached links are genuinely useful - helps one visit a site if it's temporarily or recently downed, sometimes can bypass some weak internet filters, can let one view the content of some sites without actually visiting the server which may be desirable (and maybe undesirable for the server if they rely on page hits).
saaaaaam 122 days ago [-]
The article is from February 2024, so you probably noticed them going around the time it was published! For some reason people seem to be talking about it again as though it only just happened, I’ve seen this and similar articles/threads posted a couple of other places this week.
romanhn 122 days ago [-]
It's odd, because I haven't seen cache links on Google for years. I used to rely on them quite a bit and once in a while would try again and run into "oh yeah, they seem to have dropped this feature." This whole thread is strange to me, sounds like they've been around for people much more recently? Or maybe moved location and I haven't found them (which is weird cause I looked...)
terinjokes 122 days ago [-]
Not just you, I haven't seen links to the feature, even when I've gone looking for it, in years. Even the link on archive.is to use Google's cache if the page wasn't already archived hasn't worked in quite a while.
dspillett 122 days ago [-]
> It's odd, because I haven't seen cache links on Google for years
For quite a time they stopped being a simple obvious link but where available in a drop-list of options for results for which a cashed copy was available.
zo1 122 days ago [-]
Not OP, and yes they "hid" it that way too. But I got the distinct sense that they removed it many years ago for certain websites (and more and more over the years I guess till now). They probably had some sort of flag on their analytics dashboard that website owners were given the privilege of changing so that people couldn't see the cache. Or for all we know it was some sort of "privacy" feature similar to "right to be forgotten".
praisewhitey 122 days ago [-]
For some sites. For years many search results didn't have a cache link in the drop down.
Xenoamorphous 122 days ago [-]
Same here, I found this submission really odd since I haven’t seen them in years. Maybe they did some slow roll by country?
seydor 122 days ago [-]
That was never Google's job anyway. It boggles my mind how there is very little public investment in maintaining information, while tons of money is being wasted keeping ancilarry things alive that nobody uses. We should have multiple publicly-funded internet archives, and public communication infrastructure fallback, like email.
interactivecode 122 days ago [-]
Part of the ideas behind the EU’s “very large online platform” rules. Basically saying that if your platform gets big enough where its basically important infrastructure, that then comes with responsibilities.
I would welcome some rules and regulations about this kinds of stuff. Can you imagine that google wakes up one day and decides to kill gmail? It would cause so many problems. It’s already impossible that even as a paid gmail user you can’t get proper support in case something goes awry. Sure you can argue they can decide what ever they want with their business. But if you have this many users, I do think at some point that comes with some obligations of support, quality and continued service.
seydor 122 days ago [-]
I wouldn't trust google to decide what information should be stored and what should be condemned to damnatio memoriae
_aaed 122 days ago [-]
What is Google's job? Is it only to leech off the public internet?
pacifika 122 days ago [-]
Broker ads intelligence
withzombies 122 days ago [-]
The cache link predates Google's ads business
amy-petrik-214 122 days ago [-]
As a private sector company their job is maximizing revenue forever.
There old slogan was "don't be evil"
they have a new slogan, which is "vanquish your enemies, to chase them before you, to rob them of their wealth, to see those dear to them bathed in tears"
karlgkk 122 days ago [-]
Use the internet for a week without any search engine.
tux3 122 days ago [-]
Google is not, in fact, the only search engine.
For most users the internet has 5, maybe 10 web sites. I can use Wikipedia search or LLMs when I have questions.
fuzzy_biscuit 122 days ago [-]
I see your point with Wikipedia, but the writing is on the wall for LLMs: since they are replacing search engines for some users, it's only a matter of time before that experience gets polluted with "data-driven" ad clutter too.
karlgkk 121 days ago [-]
Every search engine and also LLMs engage in the same problematic behavior.
Maybe just use Wikipedia search only then!
onetokeoverthe 122 days ago [-]
[dead]
meiraleal 122 days ago [-]
Compared to the experience of using google without ad Block?
rat9988 122 days ago [-]
Well, you can put it like that, or you can answer in good faith.
Almondsetat 122 days ago [-]
Most people want to know what is happening here and now, and if they want information about the past they prefer the latest version. Archival is a liability, not an asset, in Google's case
tetris11 122 days ago [-]
But they made it their job, got people to depend on it, and then yanked it away without telling anyone first.
Retr0id 122 days ago [-]
This article is from February. Since then, the IA partnership did materialize, and the "on its way out" `cache:` search workaround (which is still wholly necessary imho) still works.
Looks like cached pages just got more useful, not less.
varun_ch 122 days ago [-]
search “cache:https:// gizmodo(.)com/google-has-officially-killed-cache-links-1851220408” on Google.. the cache is still around, just the links are gone. also this article is from February
rbut 122 days ago [-]
From the article:
For now, you can still view Google’s cache by typing “cache:” before the URL, but that’s on its way out too.
varun_ch 122 days ago [-]
Oops, didn’t see that
alwa 122 days ago [-]
> There’s another solution, but it’s on shaky ground. The Internet Archive’s Wayback Machine preserves historic copies of websites as a public service, but the organization is in a constant battle to stay solvent. Google’s Sullivan floated the idea of a partnership with the Internet Archive, though that’s nothing close to an official plan.
Man, wish the Internet Archive hadn't staked it all tilting at copyright windmills...
TBH this is why I'm partial to Microsoft Recall or something similar, because inevitably it's going to get monetized to address link rot... and private data. Too bad there isn't a P2P option where you can "request" screenshots of cached webpages from other people's archives. Maybe it's all embedded in LLM training data sets and will be made public one day.
photonthug 122 days ago [-]
Hah, this is definitely going to happen. First llms kill the original public internet by simultaneously plagiarizing and deincentivizing everything original, then after it disappears they can sell it back to us again by unpacking the now-proprietary model data which has become the only “archive” of the pre llm internet. In other words: A product so perfect that even avoiding the product requires you have to use the product, what a complete nightmare
user_7832 122 days ago [-]
I think I've seen an extension (?) that would auto-save every webpage to your device, probably on r/datahoarder, that I'm still trying to find. I also have used a relatively easier auto-archive-to-wayback-machine extension that's probably close enough for most people.
freedomben 122 days ago [-]
I have this set up with archive box. Unfortunately, if you do much browsing, it will very quickly saturate memory and CPU on whichever machine is running the archive box. It also gets really big, really fast. There are also increasingly websites that are blocking it, so when you look at the archive it is either empty or worthless. Still worth it to some people, but it does have its challenges.
"Google Has Officially Killed Cache Links" (Feb 2024)
The cache is often still accessible through a "cache:url" search. There's been no official announcement, but it does seem like that could go away at some point too. That is even more likely now that Google has partnered with the Internet Archive.
What I'd really like to see, and maybe one good possible outcome of the mostly bogus antitrust suits is to have a continuously updated, independent, crawl resource like Common Crawl.
I once had to reconstruct a client's website from Google's cache links. It was a small business that had payed for a backup service from their ISP, that turned out never to have existed.
AStonesThrow 122 days ago [-]
Cached pages were amazingly useful in my prior role where a main objective was to detect plagiarism. There were only a handful of cheater sites in play, and 100% of them were paywalled.
So searching them in Google was exactly how students found the answers, I assume, but we wouldn't have had the smoking gun without a cached, paywall-bypass, dated copy. $Employer was definitely unwilling to subscribe to services like that!
(However, the #1 most popular cheat site, by far, was GitHub itself. No paywalls there!)
lukasb 122 days ago [-]
“There’s another solution, but it’s on shaky ground. The Internet Archive’s Wayback Machine preserves historic copies of websites as a public service, but the organization is in a constant battle to stay solvent. Google’s Sullivan floated the idea of a partnership with the Internet Archive, though that’s nothing close to an official plan.”
Too lazy to find a link, but this is now public and live, although pretty well hidden. Three dots menu for a search result -> More about this page.
lysace 122 days ago [-]
It seems like just a templated link in a hidden corner.
"The Wayback Machine has not archived that URL."
A large part of the usefulness of the cache links came from the inherent freshness and completeness of the Google indexing.
DarkCrusader2 122 days ago [-]
If the partnership with Internet archive happens, I would be glad that IA will get better funding to keep operating.
But I am also concerned with Firefox like situation happening with IA, where Google pulling funding might pose existential risk to IA.
mellow-lake-day 121 days ago [-]
If Google doesn't want to maintain their own cache why would they pay to maintain someone else's cache?
Was invisible on the search UI for some time now, but the service itself is still accessible.
Agingcoder 122 days ago [-]
I’m behind a corporate proxy. This means that a very very large portion of the internet is now unavailable to me.
Shank 122 days ago [-]
If you need to access these sites for work, I suggest requesting them sequentially. Generally, people don’t adjust filters until people complain. After you become the number one ticket creator for mundane site requests, they’ll usually bend the rules for you or learn to adjust the policy.
The reality is that people who create these filter policies often do so with very little thought, and sans complaints, they don’t know what their impact is.
freedomben 122 days ago [-]
If your company actually does this, that's impressive. The vast major Big corporates that I have seen do not even really review these requests unless they come from a high-ranking person. When they do actually review them, it's usually a cursory glance or even just a quick lookup of the category that their web filters have it on, followed by a rapid and uninformed decision to deny the request. Oftentimes they won't even read the justification written by the employee before they deny the request.
God help you if you need something that's not tcp on port 443. Yes, I'm still a little bit bitter, but I have spent a lot of time explaining the difference between TCP and UDP to IT guys who have little interest in actually understanding it, and ultimately won't understand it and will just deny the request. Sometimes after conferring with another IT person who informs them that UDP is insecure and/or unsafe, just like anything, not on Port 443.
Terr_ 122 days ago [-]
I see this as a continued sad slide away from Google as research tool towards Google as marketing funnel.
That explains why they they added a link in the results' additional info to the Internet Archive.
And some people considered that a "victory" for IA.
They'll just foot the bill while Google reap the rewards
ruthmarx 122 days ago [-]
I don't think I've used a cache link in some time. It stooped being reliable years ago, and the archive.ph type of services seemed to pick up the slack and do a much better job.
122 days ago [-]
DarkmSparks 122 days ago [-]
really just one more if not the final nail in google searches coffin tbh.
VERY rare these days a google search result actually contains what was searched for - anything with a page number in the url and cache was guaranteed to be the only way to access it.
Combine that with the already absolute epic collapse of their search result quality and ms copilot locally caching everything people do on windows, and this may well be recorded in history as the peak of google before its decline.
very sad day.
Diti 122 days ago [-]
Remember the client of most search engines are advertisers, which incentivizes the engines to not serve the most relevant results right away. You could give a (free) try at paid search engines and see if they would be worth your money.
SquareWheel 122 days ago [-]
Didn't they do that like... six months ago? Thus why they partnered with the Internet Archive recently.
xyst 122 days ago [-]
Wonder if this is really just a cost cutting measure. Those “cache links” were essentially site archives.
benguild 122 days ago [-]
Seems like a really good opportunity for a browser extension to offer links to other sources
Fire-Dragon-DoL 121 days ago [-]
Just paid for 1 year of kagi. See ya
davidgerard 122 days ago [-]
fwiw, Yandex still frequently has cached versions, and you can save the cache in archive.today.
wwarner 122 days ago [-]
so depressing. but bing still provides a link back to the cached version.
FabHK 122 days ago [-]
Many complaints about the passive voice are overblown: it’s a perfectly fine construction and most appropriate in some places. (It’s also frequently misidentified, or applied to any evasive or obfuscatory sentence, whether grammatically active or passive.)
But here is an instance where all the opprobrium is justified:
> So, it was decided to retire it.
“It was decided”? Not you decided or Google decided, but it was decided? Come on.
drzzhan 122 days ago [-]
What??? Oh no. I love that feature so much. What should I use in the future then? IA can be a solution but often the link I am interested in is not there. For example, foreign news from developing country.
izacus 122 days ago [-]
Nothing, modern website owners think that Google, IA and similar sites think their IP is being stolen by archiving it and the law agrees.
You wouldn't want to be a thief... right?
drzzhan 122 days ago [-]
I see. I was sad but you are right. This is an understandable decision.
meiraleal 122 days ago [-]
[flagged]
izacus 122 days ago [-]
What do you mean? I'm not supporting Google at all, it's great that another of their services has been turned off so they won't evilly download webpages anymore.
meiraleal 122 days ago [-]
you have a not very smart way to demonstrate you don't support Google by blaming the content creators.
freedomben 122 days ago [-]
Why must it be a binary? Either you support Google or you support the content creators?
You don't think it's possible for someone to simultaneously think that Google has made a terrible call, while also thinking that the IP industry , copyright people, and yes many content creators have gotten insane with their "rights"? (And I say this as a content creator)
meiraleal 121 days ago [-]
> You don't think it's possible for someone to simultaneously think that Google has made a terrible call
Yes it is possible, but it is not what you did in not just one but multiple posts. And it is also offtopic so your justification is nonsense
politelemon 122 days ago [-]
Bing search engine has a cache
LightBug1 122 days ago [-]
JFC ... another nail in the coffin ...
jmclnx 122 days ago [-]
I left google a while ago, removing cache is yet another reason to leave.
temptemptemp111 122 days ago [-]
[dead]
122 days ago [-]
Rendered at 05:35:54 GMT+0000 (Coordinated Universal Time) with Vercel.
These days Google will offer a result where the little blurb (which is actually a truncated mini cache) shows a part of the information I'm looking for, but the page itself does not. Removing cache means the data is just within reach, but you can't get to it anymore.
Internet Archive is a hassle and not as reliable as an actual copy of the page where the blurb was extracted from.
I'm not sure how that serves Google's interests, except perhaps that it keeps them out of legal hot water vs the news industry?
Shoot, even a meta description qualifies for that - thankfully Google uses them less and less.
Isn't this the web we want? One where big corporations don't steal our websites? Right?
It is definitely an area where there is no single "people of HN" - opinion varies widely and is often more nuanced than a binary for/against⁰ matter. From an end user PoV it is (was) a very useful feature, one that kept me using Google by default¹, and I think that many like me used it as a backup when content was down at source.
The key problem with giving access to cached copies like this is when it effectively becomes the default view, holding users in the search providers garden instead of the content provider being particularly acknowledged never mind visited and the search service making money from that through adverts and related stalking.
I have sympathy for honest sites when their content is used this way, though those that give search engines full text but paywall most of it when I look who complain about the search engine showing fuller text, can do one. Also those who do the "turn off your stalker blocker or well not show you anything" thing.
----
[0] Or ternary for/indifferent/against one.
[1] I'm now finally moving away, currently experimenting with Kagi, as a number of little things that kept me there are no longer true and more and more irritations² keep appearing.
[2] Like most of the first screen full of a result being adverts and an AI summary that I don't want, just give me the relevant links please…
I'm tired of saying this, yelling at clouds
It makes some sense, too because the edges are blurry. If a user from France receives a french version on the same URL where a US-user would receive an english version, is that already different content? What if (as it usually happens), one language gets prioritized and the other only receives updates once in a while?
And while Google recommends to treat them like you'd treat any other user when it comes to e.g. geo-targeting, in reality that's not possible if you do anything that requires compliance and isn't available in California. They do Smartphone and Desktop-crawling, but they don't do any state- or even country-level crawling. Which is understandable as well, few sites really need to or want to do that, and it would require _a lot_ more crawling (e.g. in the US you'd need to hit each URL once per state), and there's no protocol to indicate it (and there probably won't be one because it's too rare).
The recommended (or rather correct) way to do this is to have multiple language-scoped URLs, be it a path fragment or entirely different (sub)domains. Then you cross-link each other with <link> tags with rel="alternate" and hreflang (for SEO purposes) and give the user some affordance to switch between them (only if they want to do so).
https://developers.google.com/search/docs/specialty/internat...
Public URLs should never show different content depending on anything else than the URL and current server state. If you really need to do this, 302 Redirect into a different URL.
But really, don't do that.
If the URL is language-qualified but it doesn't match whatever language/region you guessed for the user (which might very well be wrong and/or conflicting, e.g. my language and IP's country don't match, people travel, etc.) just let the user know they can switch URLs manually if they want to do so.
You're just going to annoy me if you redirect me away to a language I don't want just because you tried being too smart.
As a real-world example: you're providing some service that is regulated differently in multiple US-states. Set up /ca/, /ny/ etc and let them be indexed and you'll have plenty of duplicate content and all sorts of trouble that comes with it. Instead you'll geofence like everyone else (including Google's SERPs) and a single URL now has content that depends on the perceived IP location because both SEO and legal will be happy with that solution, and neither will be entirely happy with the state-based urls.
So what do you propose that such a site shows on their root URL? It's possible to pick a default language (eg. English), but that's not a very good experience when the browser has already told you that they prefer a different language, right? It's possible to show a language picker, but that's not a very good experience for all users, then, as their browser has already told you which language they prefer.
> This header serves as a hint when the server cannot determine the target content language otherwise (for example, use a specific URL that depends on an explicit user decision). The server should never override an explicit user language choice. The content of Accept-Language is often out of a user's control (when traveling, for instance). A user may also want to visit a page in a language different from the user interface language.
So basically: don't try to be too smart. I'm more often than not bitten by this as someone whose browser is configured in English but often would like to visit their native language. My government's websites do this and it's infuriating, often showing me broken English webpages.
The only acceptable use would be if you have a canonical language-less URL that you might want to redirect to the language-scoped URL (e.g. visiting www.example.com and redirecting to example.com/en or example.com/fr) while still allowing the user to manually choose what language to land in.
If I arrive through Google with English search terms, believe it or not, I don't want to visit your French page unless I explicitly choose to do so. Same when I send some English webpage to my French colleague. This often happens with documentation sites and it's terrible UX.
To answer your comment: yes you should return the same content (resource) from that URL (note the R in URL). If you want/can, you can attend to the Accept header to return it in other representation, but the content should be the same.
So /posts should return the same list of posts whether in HTML, JSON or XML representation.
But in practice content negotiation isn't used that often and people just scope APIs in their own subpath (e.g. /posts and /api/posts) since it doesn't matter that much for SEO (since Google mostly cares about crawling HTML, JSON is not going to be counted as duplicate content).
IOW pragmatism.
https://developers.google.com/search/docs/specialty/internat...
> If your site has locale-adaptive pages (that is, your site returns different content based on the perceived country or preferred language of the visitor), Google might not crawl, index, or rank all your content for different locales. This is because the default IP addresses of the Googlebot crawler appear to be based in the USA. In addition, the crawler sends HTTP requests without setting Accept-Language in the request header.
> Important: We recommend using separate locale URL configurations and annotating them with rel="alternate" hreflang annotations.
The paradox of the internet archive's existence is if all that data were easily searchable and integrated (i.e. if people really used it) they would not exist by way of no more money for bandwidth and by way of lawsuit hell. So they exist to share archived data, but if they share archived data they would not exist.
and so it is a wonderful magical resource, absolutely, but your "power user" level as well as "free time level" has to be such that you build your own internet archive search engine and google cache plugin alternative... and not share it with anyone for the above existential reasons
E.g. images deleted from Reddit can sometimes be seen in Google image results, but when following the links they might lead to deleted posts.
The simplest answer could be that making the cache accessible costs them money and now they're tightening their purse strings. But maybe it's something else...
For sites that manipulate search rankings by showing a non-paywalled article to Google's search bot, while serving paywalled articles to regular users, the cache acts as a paywall bypass. Perhaps Google was taking heat for this, and/or they're pre-emptively reducing their legal liabilities?
Now IA gets to take that heat instead...
The cache is arguably a strategic resource for google now.
This has always been one of Google's best features. Really sad they killed it.
Coral Cache maybe? The caches you listed were my manual order to check when a link was Slashdotted.
Google's cache, at least in the early days, was super useful in the cache link for a search result highlighted your search terms. It was often more helpful to hit the cached link than the actual link since 1) it was more likely to be available and 2) had your search terms readily apparent.
It is not as complete as Google's, but it is usually good enough.
For quite a time they stopped being a simple obvious link but where available in a drop-list of options for results for which a cashed copy was available.
I would welcome some rules and regulations about this kinds of stuff. Can you imagine that google wakes up one day and decides to kill gmail? It would cause so many problems. It’s already impossible that even as a paid gmail user you can’t get proper support in case something goes awry. Sure you can argue they can decide what ever they want with their business. But if you have this many users, I do think at some point that comes with some obligations of support, quality and continued service.
For most users the internet has 5, maybe 10 web sites. I can use Wikipedia search or LLMs when I have questions.
Maybe just use Wikipedia search only then!
https://blog.archive.org/2024/09/11/new-feature-alert-access...
Looks like cached pages just got more useful, not less.
Man, wish the Internet Archive hadn't staked it all tilting at copyright windmills...
(see e.g. https://news.ycombinator.com/item?id=41447758)
https://killedbygoogle.com/
The cache is often still accessible through a "cache:url" search. There's been no official announcement, but it does seem like that could go away at some point too. That is even more likely now that Google has partnered with the Internet Archive.
What I'd really like to see, and maybe one good possible outcome of the mostly bogus antitrust suits is to have a continuously updated, independent, crawl resource like Common Crawl.
Lots of discussion then:
https://news.ycombinator.com/item?id=39198329
More recently:
New Feature Alert: Access Archived Webpages Directly Through Google Search
https://news.ycombinator.com/item?id=41512341
So searching them in Google was exactly how students found the answers, I assume, but we wouldn't have had the smoking gun without a cached, paywall-bypass, dated copy. $Employer was definitely unwilling to subscribe to services like that!
(However, the #1 most popular cheat site, by far, was GitHub itself. No paywalls there!)
Too lazy to find a link, but this is now public and live, although pretty well hidden. Three dots menu for a search result -> More about this page.
"The Wayback Machine has not archived that URL."
A large part of the usefulness of the cache links came from the inherent freshness and completeness of the Google indexing.
Was invisible on the search UI for some time now, but the service itself is still accessible.
The reality is that people who create these filter policies often do so with very little thought, and sans complaints, they don’t know what their impact is.
God help you if you need something that's not tcp on port 443. Yes, I'm still a little bit bitter, but I have spent a lot of time explaining the difference between TCP and UDP to IT guys who have little interest in actually understanding it, and ultimately won't understand it and will just deny the request. Sometimes after conferring with another IT person who informs them that UDP is insecure and/or unsafe, just like anything, not on Port 443.
https://google.com/search?q=cache%3Ahttps%3A%2F%2Fnews.ycomb...
And some people considered that a "victory" for IA.
They'll just foot the bill while Google reap the rewards
VERY rare these days a google search result actually contains what was searched for - anything with a page number in the url and cache was guaranteed to be the only way to access it.
Combine that with the already absolute epic collapse of their search result quality and ms copilot locally caching everything people do on windows, and this may well be recorded in history as the peak of google before its decline.
very sad day.
But here is an instance where all the opprobrium is justified:
> So, it was decided to retire it.
“It was decided”? Not you decided or Google decided, but it was decided? Come on.
You wouldn't want to be a thief... right?
You don't think it's possible for someone to simultaneously think that Google has made a terrible call, while also thinking that the IP industry , copyright people, and yes many content creators have gotten insane with their "rights"? (And I say this as a content creator)
Yes it is possible, but it is not what you did in not just one but multiple posts. And it is also offtopic so your justification is nonsense