NHacker Next
  • new
  • past
  • show
  • ask
  • show
  • jobs
  • submit
Surfer: Open-Source Personal Data Warehouse (github.com)
xnx 3 days ago [-]
Love this idea. It's crazy that every company knows more about my online activity than I know about it myself.

Definitely worried that running the extractors would get my accounts banned for violating policies against automated requests.

j45 3 days ago [-]
No need to worry - it's your data.

Large organizations do a takeout/checkout process that sends you a notification when the data is ready for manual exports.

slalani304 3 days ago [-]
This hasn't happened for any of my accounts and I've been working on this for the last few months.
imglorp 3 days ago [-]
Even so, would a rate limit be wise?
fc417fc802 2 days ago [-]
Rate limits are almost always wise, regardless of whether you happen to be the client or the server.
mdrzn 8 hours ago [-]
Looks fantastic but I think to really be useful it would require more data: import my Chrome navigation data (since it only stores the last 90 days), import my Facebook takeout, import my whole Google takeout (not just Gmail, what about Youtube?), if it could even organize my whole Whatsapp exported history that'd be golden.
alok-g 2 days ago [-]
>> Data Liberation: Export your personal data from platforms like YouTube, GitHub, Notion, Twitter, and LinkedIn.

Is there a complete list of supported platforms somewhere? Thanks.

snthpy 2 days ago [-]
Love the idea!

I still need to look at the implementation later but just wondering if there is any overlap / possibility for leveraging atproto from Bluesky? Probably different use cases but it just made me think of PDSs.

slalani304 2 days ago [-]
Have not heard of that, but is it similar to the Solid project / other personal data warehouse projects?
snthpy 2 days ago [-]
So it's not really a Personal Data Warehouse (PDW?) at all but rather an open replacement for Twitter/X, at least Bluesky is that. However Bluesky is really an MVP for ATProtocol which is meant to serve as an open protocol to allow decentralized data sharing where each user is in control over their data (at least that's my understanding of it).

So if you consider the spectrum of centralization -> decentralization for current web apps and data. Things like Twitter/X and any web 2.0 app are fully centralized where the app controls all your data.

Things like Mastodon and the Fediverse are decentralized but suffer from fragmentation and a lack of a global picture of the Fediverse.

ATProto tries to find a middle ground to reap the benefits of both. Users have a decentralized identity (did) which they can control and their content is tied to that and stored in a PDS (Personal Data Server). To bootstrap the process, Bluesky creates and hosts PDSs for new users but you can self-host that if you like (see https://atproto.com/guides/self-hosting). To reap the web2.0 benefits of a global picture there are relays which catalog and index the data from the PDSs. The architecture is explained quite well in https://atproto.com/articles/atproto-for-distsys-engineers.

So Bluesky is in a sense just an atproto MVP and people are building other things on top of it. Some examples:

* WhiteWind (https://whtwnd.com/): atproto blogging platform

* Smoke Signal (https://smokesignal.events/): atproto events platform (Meetup clone)

I think by default all the atproto/PDS data is public so maybe not what you want for a PDW but maybe you could make it work with per item encryption keys so that you can selectively share data with people if you want? Depending on what the goals are that might not be suitable from a performance perspective but I'm just brainstorming here.

I'm still trying to get my head around it all and so far the main innovation I see is around providing the decentralized-id (did) although the creation appears to still be centralized (but this is the part I understand least so far). Of course the other big thing they bring to the table is that they have about 25M users now so there is an actual network of users so this could take off. There are many great decentralized schemes but most of them never grow beyond a niche set of users.

I hope something like ATProto takes off so that users and humans/meat-persons can regain sovereignty of their data and we can create a free and open internet.

P.S. There's an interesting Github Issue in the ATProto repo: #3409 Call for Developer Projects (https://github.com/bluesky-social/atproto/discussions/3049#)

jdmg94 2 days ago [-]
Great project, is there any plans to add linux support?
slalani304 2 days ago [-]
I'm not on Linux but if others want to contribute / are very interested we can definitely make that happen!
outlaw42 2 days ago [-]
I have been using the python sdk on ubuntu and it's been working well, so far at least!
bix6 3 days ago [-]
Love it and the name!

Is this the best option out there right now? I’ve seen other projects from time to time but not sure what’s stuck.

slalani304 2 days ago [-]
Haven't seen anything personally that's as intuitive / easy to use as this one, but I'm biased
secstate 2 days ago [-]
Haha, I thought this looked like an awesome idea, then I looked through the supported services and realized I already don't use any of them anymore!
ErikBjare 4 days ago [-]
Nice to finally see this on HN! Excited about it's future
Carrok 3 days ago [-]
Edwardzhang9 3 days ago [-]
huge
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
Rendered at 16:09:26 GMT+0000 (Coordinated Universal Time) with Vercel.