all lasting work is done in files/data (can be parsed permissionlessly, still useful if partially corrupted), but economic incentives keep pushing us to keep things in code (brittle, dies basically when one of maintainer|buildtools|hardware substrate dies).
when standards emerge (forcing code to accept/emit data) that is worth so much to a civilization. a developer ecosystem tipping the incentive scales such that companies like the Googl/Msft/OpenAI/Anthropics of the world WANT to contribute/participate in data standards rather than keep things proprietary is one of the most powerful levers we as a developer community collectively hold.
(At the same time we shoudl also watch out for companies extending/embracing/extinguishing standards... although honestly outside of Chrome I struggle to think of a truly successful example)
I think it's on open social apps to show that they're actually meaningfully better products, and that is possible because they're open. With luck, this may lead to an ecosystem where it's worth staying compatible and interoperable, and where users scoff if someone is trying to break it, and where users have an easy way to walk away. I know this sounds super idealistic but this did essentially happen with open source over a long time. At some point, people were just as skeptical of open source as we might be about open social.
I think that's an overly charitable take. Giving Google/MSFT/OpenAI/Anthropic what they want does not guarantee a return on dividends. Standards are nice, but Apple is a giant testament to the fact that all the standards in the world won't move an adequately entrenched business.
POSSE and AT Protocol can be understood as interoperable marketplaces. Platforms like Reddit and Instagram already function this way: the product is user content, the payment is attention, and the platform’s cut is ads or behavioral data. Dan argues that this structure is not inevitable. If social data is treated as something people own and store themselves, applications stop being the owners of social graphs and become interfaces that read from user-controlled data instead.
I am working on a similar model for commerce. Sellers deploy their own commerce logic such as orders, carts, and payments as a hosted service they control, and marketplaces integrate directly with seller APIs rather than hosting sellers. This removes platform overhead, lowers fees, and shifts ownership back to the people creating value, turning marketplaces into interoperable discovery layers instead of gatekeepers.
This article goes into a lot of detail, more than is really needed to get the point across. Much of that could have been moved to an appendix? But it's a great metaphor. Someone should write a user-friendly file browser for PDS's so you can see it for yourself.
I'll add that, like a web server that's just serving up static files, a Bluesky PDS is a public filesystem. Furthermore it's designed to be replicated, like a Git repo. Replicating the data is an inherent part of how Bluesky works. Replication is out of your control. On the bright side, it's an automatic backup.
So, much like with a public git repo, you should be comfortable with the fact that anything you put there is public and will get indexed. Random people could find it in a search. Inevitably, AI will train on it. I believe you can delete stuff from your own PDS but it's effectively on your permanent record. That's just part of the deal.
So, try not to put anything there that you'll regret. The best you could do is pick an alias not associated with your real name and try to use good opsec, but that's perilous.
My goal with writing is generally to move things out of my head in the shape that they existed in my head. If it's useful but too long, I trust other people to pick what they find valuable, riff on it, and so on.
>Someone should write a user-friendly file browser for PDS's so you can see it for yourself.
You can use pdsfs[0] to mount a user's pds locally using FUSE read-only. It's mentioned in the blogpost. I remember seeing a tool posted for mounting them read-write, if you sign in, but can't remember where to find it.
Interesting - I just spent all day on this on an app which I'm using. My architecture is a little different (probably worse).
The app lives on a single OpenBSD server. All user data is stored in /srv/app/[user]. Authentication is done by accessing OpenBSD Auth helper functions.
Users can access their data through the UI normally. Or they can use a web based filesystem browser to edit their data files. Or, alternately, they can ssh into the server and have full access to their files with all the advantages this entails. Hopefully, this raises the ceiling a bit for what power users of the system can accomplish.
I wanted to unify the OS ecosystem and the web app ecosystem and play around with the idea of what happens if those things aren't separate. I'm sure I'm introducing all kinds of security concerns which I'm not currently aware of.
Another commenter brought up Perkeep, which I think is very interesting. Even though I love Plan 9 conceptually, I do sort of wonder if "everything is a file" was a bit of a wrong turn. If I had my druthers, I think building on top of an OS which had DB and blob storage as the primary concept would be interesting and perhaps better.
If anybody cares, it's POOh stack, Postgres, OCAML, OpenBSD, AND htmx
Is there anything stopping me from backdating my own records? Since the createdAt is just an arbitrary field, I can just write whatever I want in there, right? Is there a way for the viewing application to verify when the record was created (and not modified since), maybe by looking at the mentioned signing?
You can indeed backdate records. Since the application knows when it has first seen (i.e. indexed) your record, it can decide what to do with that information. If there's a difference, the Bluesky app, for example, shows its own indexing time, but also shows a separate panel saying the post is backdated to some other date. Other apps could choose to show something else.
It is possible to create links asserting a specific version by making a "strong ref" which includes content hash.
I've been thinking of this for some time, conceptually, but perhaps from a more fundamental angle. I think the idea of "files" is pretty dated and can be thrown out. Treat everything as data blobs (inspired by PerKeep[0]) addressed by their hashes and many of the issues described in the article just aren't even a thing. If it really makes sense, or for compatibility sake, relevant blobs can be exposed through a filesystem abstraction.
Also, users don't really want apps. What users want are capabilities. So not Bluesky, or YouTube for example, but the capability to easily share a life update with interested parties, or the capability to access yoga tutorial videos. The primary issue with apps is that they bundle capabilities, but many times particular combinations of capabilities are desired, which would do well to be wired together.
Something in particular that's been popping up fairly often for me is I'm in a messaging app, and I'd like to lookup certain words in some of the messages, then perhaps share something relevant from it. Currently I have to copy those words over to a browser app for that lookup, then copy content and/or URL and return to the messaging app to share. What I'd really love is the capability to do lookups in the same window that I'm chatting with others. Like it'd be awesome if I could embed browser controls alongside the message bubbles with the lookup material, and optionally make some of those controls directly accessible to the other part(y|ies), which may even potentially lead to some kind of adhoc content collaboration as they make their own updates.
It's time to break down all these barriers that keep us from creating personalized workflows on demand. Both at the intra-device level where apps dominate, and at the inter-device level where API'd services do.
I'm using filesystem more as a metaphor than literally.
I picked this metaphor because "apps" are many-to-many to "file formats". I found "file format" to be a very powerful analogy for lexicons so I kind of built everything else in the explanation around that.
The repository data structure is content-addressed (a Merkle-tree), and every mutation of repository contents (eg, addition, removal, and updates to records) results in a new commit data hash value (CID). Commits are cryptographically signed, with rotatable signing keys, which allows recursive validation of content as a whole or in part. Repositories and their contents are canonically stored in binary DAG-CBOR format, as a graph of data objects referencing each other by content hash (CID Links). Large binary blobs are not stored directly in repositories, though they are referenced by hash (CID).
Re: apps, I'd say AT is actually post-app to some extent because Lexicons aren't 1:1 to apps. You can share Lexicons between apps and I totally can see a future where the boundaries are blurring and it's something closer to what you're describing.
I've always thought walled gardens are the effect of consumer preferences, not the cause.
The effect of the internet (everything open to everyone) was to create smaller pockets around a specific idea or culture. Just like you have group chats with different people, thats what IG and Snap are. Segmentation all the way down.
I am so happy that my IG posts arent available on my HN or that my IG posts arent being easily cross posted to a service I dont want to use like truth social. If you want it to be open, just post it to the web.
I think I don't really understand the benefit of data portability in the situation. It feels like in crypto when people said I want to use my Pokemon in game item in Counterstrike (or any game) like, how and why would that even be valuable without the context? Same with a Snap post on HN or a HN post on some yet-to-be-created service.
>I am so happy that my IG posts arent available on my HN or that my IG posts arent being easily cross posted to a service I dont want to use like truth social.
ATProto apps don't automatically work like this and don't support all types of "files" by default. The app's creator has to built support for a specific "file type". My app https://anisota.net supports both Bluesky "files" and Leaflet "files", so my users can see Bluesky posts, Leaflet posts, and Anisota posts. But this is because I've designed it that way.
Anyone can make a frontend that displays the contents of users PDSs.
I also have a little side project called Aturi that helps provide "universal links" so that you can open ATProto-based content on the client/frontend of your choice: https://aturi.to/anisota.net
Except that a lot of the app builders in ATProto seem to think the protocol was designed to make their lives easier when bootstrapping their network from Bluesky userbase.
(Imo, that is a perverse interpretation, it's about user choice, which they are effectively taking away from me by auto importing and writing to my Bsky graph)
re: the debates on reusing follows from Bluesky in other apps instead of their own
I agree. I don't understand the driving force here.
I have all of the raw image files that I've uploaded to Instagram. I can screenshot or download the versions that I created in their editor. Likewise for any text I've published anywhere. I prefer this arrangement, where I have the raw data in my personal filesystem and I (to an extent) choose which projections of it are published where on the internet. An IG follow or HN upvote has zero value to me outside of that platform. I don't feel like I want this stuff aggregated in weird ways that I don't know about.
For me, part of it is that we have no power collectively against products turning their back on users because coordination to "export data all at once and then import it into specific other place" is near-impossible. So this creates a perverse cycle where once you capture enough of the market, competition has very little chance unless they change the category entirely.
What AT enables is forking products with their data and users. So, if some product is going down a bad road, a motivated team can fork it with existing content, and you can just start using the new thing while staying interoperable with the old thing. I think this makes the landscape a lot more competitive. I wrote about this in detail in https://overreacted.io/open-social/#closed-social which is another longread but specifically gets into this problem.
I hear you re: not wanting "weird aggregation", that just be a matter of taste. I kind of feel like if I'm posting something on the internet, I might as well have it on the open web as aggregatable by other apps.
Thanks for your thoughts. I do feel like this is all very specifically connected to Twitter. Which tech people really adopted, but I never used much, so it is interesting that these different perspectives are somewhat tied to one's megaplatform(s) of choice. I don't know of another social network that has inspired such a consistent effort to be forked or cloned. I do kind of feel like "change the category" and create a new network with the traits you want to see is the right move.
For better or worse, Twitter built their network. That many people willingly signed up and posted and continue to post there. I don't think anyone really should be able to fork it, because the users didn't collectively agree to that, and they don't all agree on what a good road or a bad road is. Ultimately, they can choose to leave if and when they want. Are these networks rather sticky, yes of course, but that's life.
We've seen lots of social networks come and go, things do change over time, there's ample opportunity for new ideas to flourish. In that sense, AT is perfectly welcome to throw their hat in the ring and see if that resonates and sticks. If people want their social network to be forkable, that concept will succeed.
I do think it misses what a lot of people find valuable about the tangibility and constraints of "I am making this content specifically for this platform and this audience at this point in time." I don't think most people think of their social media posts as a body of work that they want to maintain and carry over in an abstract sense independent of platform, and give open license to anyone to cook up into whatever form they can dream of.
Just the other week, another service that people actively used called Bento announced shutdown: https://bento.me/. This sucks for the user.
Someone created an alternative called Blento (https://blento.app/) on AT. Of course, by itself, this doesn't mean they'll be successful. But the thing is that, if Blento shuts down, someone can put it right back up because (1) it's open source, and (2) the data is outside Blento. Any new app can kickstart with that data and get people's sites back up and running. And two platforms can even compete on top of the same data.
I agree content is tailed to the platform and resurrecting something doesn't necessarily makes sense. But that's the point of lexicons. You get the choice of what makes sense to resurrect (actively moving to an alternative) vs what doesn't (something with a style that doesn't work elsewhere) vs new recontextualizations we haven't even tried or thought of. I think it's early to dismiss before trying.
> I think I don't really understand the benefit of data portability in the situation.
Twitter was my home on the web for almost 15 years when it got taken over by a ... - well you know the story. At the time I wished I could have taken my identity, my posts, my likes, and my entire social graph over to a compatible app that was run by decent people. Instead, I had to start completely new. But with ATProto, you can do exactly that - someone else can just fork the entire app, and you can keep your identity, your posts, your likes, your social graph. It all just transfers over, as long as the other app is using the same ATProto lexicon (so it's basically the same kind of app).
But what if your entire social graph didn't choose to transfer over as well? What if they don't want to be on that app? What if someone that was very indecent made a compatible app? Would you want your entire Twitter history represented on there?
For better or worse, I don't think it makes sense to decentralize social. The network of each platform is inherently imbued with the characteristics and culture of that platform.
And I feel like Twitter is the anomalous poster child for this entire line of thinking. Pour one out, let it go, move on, but I don't think creating generalized standards for social media data is the answer. I don't want 7 competing Twitter-like clones for different political ideologies that all replicate each others' data with different opt-in/opt-out semantics. That sounds like hell.
The framing of "portability" is a bit confusing. Your data is not actually "transferring" anywhere, it's always in your PDS. These other apps and clients are just frontends that are displaying the data that is in your PDS. The data is public and open, though private data is in the works and hopefully will arrive in 2026.
The data is not transferring, but the user is. When I sign up for e.g. Twitter, I don't want to sign up for Mastodon, or Bluesky, or Truth Social, or whatever other platform someone might create later. Thus I would not choose to put my data in a PDS. I feel like that would actually leave me with less ownership and control than I have now.
My point is that I don't believe the separation of frontend and data is desirable for a social network. I want to know that I am on a specific platform that gives me some degree of control and guarantee (to the extent that I trust that platform) over how my data is represented. I don't really have to worry that it's showing up in any number of other places that I didn't sign up for (technically I do since everything public can be scraped of course, but in practice there are safeguards that go out the window when you explicitly create something like a PDS).
Do I understand correctly that your main concern is that some random service would serve a page when asked for /your-handle, and so, given a link, someone may assume that you actively use that service? Just trying to understand the exact scenario.
Generally it's good practice for AT apps to treat you as not signed up if you have not explicitly signed up to this app before. So I think a well-behaved site you never used shouldn't show your profile page on request. If this is right, maybe it needs to be more emphasized in docs, templates, etc. Same as they should correctly handle lifecycle events like account getting deactivated.
Realistically it's possible some services don't do this well but it's not clear to me that this is going to be a major problem. Like if it gets really bad, it seems like either this will get fixed or people will adjust expectations about how profile pages work.
I think part of the issue is that humans are hyper conditioned to expect a certain UX and set of conditions thanks to the past 3 decades of the internet + legacy social media. It only feels weird that you could publish content on Bluesky and have it show up on some other app without your consent because of how we've been conditioned. There will be a lot of unconditioning and reconditioning that has to take place over a span of time if the ATmosphere (or any new vision of social media) wants to succeed.
https://anisota.net will display any Bluesky content and profiles without the consent of the user — no one cares at the moment though because either 1) they don't know about such a niche project, or 2) they aren't concerned cause I'm not a controversial figure.
If Truth Social was suddenly a part of the ATmosphere or a part of some other wide network of users, most people would catch on eventually and be hopefully conditioned to realize that the mere presence of someone's content on an app/site doesn't mean they use that app/site
FWIW, I think Anisota is a bit different because conceptually people see it as a Bluesky client. So it is expected that you're "projecting" Bluesky, for better or worse. Whereas if it's some fanart exchange service or something, maybe it makes less sense. Maybe it just depends on what you think the user would expect.
This sounds like I need to host my PDS. Easy for me with no public profile but if I was someone famous wouldn't that mean I needed enterprise class hosting?
You don't need to host your own PDS for any of this to work. It works the same way regardless of who hosts your PDS.
I think what may be confusing you is that Bluesky (the company) acts in two different roles. There's hosting (PDS) and there's an app (bsky.app). You can think of these conceptually as two different services or companies.
Yes, when you sign up on Bluesky, you do get "Bluesky hosting" (PDS). But hosting doesn't know anything about apps. It's more like a Git repo under the hood.
Different apps (Bluesky app is one of them) can then aggregate data from your hosting (wherever it is) and show different projections of it.
Finally, no, if you're famous, you don't need enterprise hosting. Hosting a PDS can be extremely cheap (like $1/mo maybe)? PDS doesn't get traffic spikes on viral content because it's amortized by the app (which serves from its DB).
>The effect of the internet (everything open to everyone) was to create smaller pockets around a specific idea or culture. Just like you have group chats with different people, thats what IG and Snap are. Segmentation all the way down.
I actually agree with that. See from the post:
>For some use cases, like cross-site syndication, a standard-ish jointly governed lexicon makes sense. For other cases, you really want the app to be in charge. It’s actually good that different products can disagree about what a post is! Different products, different vibes. We’d want to support that, not to fight it.
AT doesn't make posts from one app appear in all apps by default, or anything like that. It just makes it possible for products to interoperate where that makes sense. It is up to whoever's designing the products to decide which data from the network to show. E.g. HN would have no reason to show Instagram posts. However, if I'm making my own aggregator app, I might want to process HN stuff together with Reddit stuff. AT gives me that ability.
To give you a concrete example where this makes sense. Leaflet (https://leaflet.pub/) is a macroblogging platform, but it ingests Bluesky posts to keep track of quotes from the Leaflets on the network, and display those quotes in a Leaflet's sidebar. This didn't require Leaflet and Bluesky to collaborate, it's just naturally possible.
Another reason to support this is that it allows products to be "forked" when someone is motivated enough. Since data is on the open network, nothing is stopping from a product fork from being perfectly interoperable with the original network (meaning it both sees "original" data and can contribute to it). So the fork doesn't have to solve the "convince everyone to move" problem, it just needs to be good enough to be worth running and growing organically. This makes the space much more competitive. To give an example, Blacksky is a fork of Bluesky that takes different moderation decisions (https://bsky.app/profile/rude1.blacksky.team/post/3mcozwdhjo...) but remains interoperable with the network.
I’ve been reading “The Unix Programming Environment”. It’s made me realize how much can be accomplished with a few basic tools and files (mostly plain text). I want to spend some time thinking of what a modern equivalent would look like. For example, what would Slack look like if it was file (and text) oriented and UNIXy? Well, UNIX had a primitive live chat in the form of live inter-user messaging. I’d love to see a move back to simpler systems that composed well.
Unix gave the world a lot of good ideas about architecture, but I think it really hamstrung itself by treating all the data as plain text and resisting the idea of having any kind of structured, formatted data passed within a pipeline. It's nice to be able to serialize to something human-readable and -editable, but constantly re-parsing and re-formatting it becomes a real pain.
I'm skeptical of these kind of like, self-describing data models. Like, I generally like at proto--because I like IPFS--but I think the whole "just add a lexicon for your service and bickety bam, clients appear" is a leap too far.
For example, gaze upon dev.ocbwoy3.crack.defs [0] and dev.ocbwoy3.crack.alterego [1]. If you wanted to construct a UI around these, realistically you're gonna need to know wtf you're building (it's a twitter/bluesky clone); there simply isn't enough information in the lexicons to do a good job. And the argument can't be "hey you published a lexicon and now people can assume your data validates", because validation isn't done on write, it's done on read. So like, there really is no difference between this and like, looking up the docs on the data format and building a client. There are no additional guarantees.
Maybe there's an argument for moving towards some kind of standardization, but... do we really need that? Like are we plagued by dozens of slightly incompatible scrobbling data models? Even if we are, isn't this the job of like, an NPM library and not a globally replicated database?
Anyway, I appreciate that, facially, at proto is trying to address lock in. That's not easy, and I like their solution. But I don't think that's anywhere near the biggest problem Twitter had. Just scanning the Bluesky subreddit, there's still problems like too much US politics and too many dick pics. It's good to know that some things just never change I guess.
Not sure I fully get you... In your example, isn't the problem that nobody cares about this data? So there is no motivation to build a client. Whereas if these were beloved notes or minisites or whatever that got wiped out by the latest acquisition (e.g. see https://bento.me/ shutting down), people would know exactly what those are, and there would be incentive for someone to compete for the userbase.
E.g. Blento (https://blento.app/) is atproto Bento that I only saw a couple of days ago. But the cool thing is that if it shuts down, not only someone else can set it up again (it's open source), but they're also gonna be able to render all of the users' existing content. I think that's a meaningful step forward for this use case.
Yes, there's gonna be tons of stuff on the network that's too niche, but then there's no harm in it either. Whereas wherever there is enough interest, someone can step in and provide the code for the data.
Interesting concept for all new social platforms that already live in federated, distributed environments that share communication protocols and communication data formats.
I bet more difficult to push existing commercial platforms to anyhow consider.
That would make marketing tools to manage social communications and posting across popular social media, much easier. Never the less Social Marketing tools have already invented we similar analogy just to make control over own content and feedback across instances and networks.
We still live in a world where some would say BSKY some would say Mastodon is the future… while everybody still has facebook and instagram and youngsters tik tok too. Those are closed platforms where only tools to hack them, not standards persist
remoteStorage seems aimed at apps that don't aggregate data across users.
AT aims to solve aggregation, which is when many users own their own data, but what you want to display is something computed from many of them. Like social media or even HN itself.
remoteStorage is still occasionally getting updates. https://solidproject.org is a somewhat newer, similar project backed by Tim Berners-Lee. (With its own baggage.)
I think of those projects as working relatively well for private data, but public data is kinda awkward. ATProto is the other way around: it has a lot of infra to make public data feasible, but private data is still pretty awkward.
It's a lot more popular though, so maybe has a bigger chance of solving those issues? Alternatively, Bluesky keeps its own extensions for that, and starts walling those bits off more and more as the VCs amp up the pressure. That said, I know very little about Bluesky, so this speculation might all be nonsense.
This, Local-first Software [1], the Humane Web Manifesto [2], etc. make me optimistic that we're moving away from the era of "you are the product" dystopian enshittification to a more user-centric world. Here's hoping.
Bluesky is not huge, but 40M users is not nothing either. You don't get people to want this, you just try to build better products. The hope is that this enables us all to build better products by making them more interoperable by default. Whether this pans out remains to be seen.
I also don't think the average user gets the value of the protocol yet. Most of those users were looking for a new, more politically palatable home but with the same features as Twitter. The new generation of apps on the protocol will be vital in showing users what's possible. IMO the two most valuable features at a practical level are:
- social graph portability, which might look like having an onboarding experience that bootstraps your community on that app
- lexicon cross compatibility, i.e. your data from app A shows up in a contextually relevant spot in app B. Or app B writes records that show up in app A. This is pretty key to get right because it might confuse or anger users if they aren't condition to expect it.
Once the average user groks these features though, I'd be surprised if they voluntarily switch back to the standard corpo apps that eventually exit to some company who tries to monetize the shit out of every feature.
I think most people do want this. They want to own their data. If you ask someone if they post on IG, if they should own that, or IG, they'll tell you it's them.
The hard problem IMO is how do you incentivize companies from adopting this since walled gardens helps reduce competition.
We want more control over data that we've created, and more control over data that's about us. I'm not sure either of these concepts align well with "ownership" though. Property and data are concepts that don't mix.
Language nitpicking aside... you subvert the walls of their gardens and aggregate the walled-off data without the walls, so users face a choice not between:
- facebook
- everything else
but instead between
- facebook and everything else
- just facebook
But that approach only works if we can solve the "data I created" problems in a way that doesn't also require us to acknowledges facebook's walls.
To share is to lose control. You can't undo, even once shared, it can't be undone. You can't retract a published novel. You can't retract a broadcast music or show. What makes you think you can do it over internet?
I don't think my article makes any claims that one can undo sharing. What I'm saying is that we benefit collectively from being able to untether data from applications. It's the same logic as https://stephango.com/file-over-app but applied to the aggregating web applications.
This was a nice intro to AT (though I feel it could have been a bit shorter)
The whole things seems a bit over engineered with poor separation of concerns.
It feels like it'd be smarter to flatten the design and embed everything in the Records. And then other layers can be built on top of that
Making every record includes the author's public-key (or signature?). Anything you need to point at you'd either just give its hash, or hash + author-public-key. This way you completely eliminate this goofy filesystem hierarchy. Everything else is embed it in the Record.
Lexicons/Collections are just a field in the Record. Reverse looking up the hash to find what it is, also a separate problem.
Yes. SSB and ANProto do this. We actually can simply link to a hash of a pubkey+signature which opens to a timestamped hashlink to a record. Everything is a hash lookup this way and thus all nodes can store data.
{:record {:person-key **public-key**
:type :twitter-post
:message "My friend {:person-key **danabramov-public-key**} suggested I make this on this HN post {:link **record-of-hn-post-hash**}. Look at his latest post {:link **danabramov-newtwitter-post-hash** :person-key **danabramov-public-key**} it's very cool!"}
:hash **hash-of-the-record**
:signature **signature-by-author**}
So everything is self-contained. The other features you'd build on top of this basic primitive
- Getting the @danabramov username would be done by having some lookup service that does person-key->username. You could have several. Usernames can be changed with the service.. But you can have your own map if you want, or infer it from github commits :)) There are some interesting ideas about usernames about. How this is done isn't specified by the Record
- Lexicon is also done separately. This is some validation step that's either done by a consumer app/editor of the record or by a server which distributes records (could be based on the :type or something else). Such a server can check if you have less than 300 graphemes and reject the record if it fails. How this is done isn't specified by the Record
- Collection.. This I think is just organizational? How this is done isn't specified by the Record. It's just aggregating all records of the same type from the same author I guess?
- Hashes.. they can point at anything. You can point at a webpage or an image or another record (where you can indicate the author). For dynamic content you'd need to point at webpage that points at a static URL which has the dynamic content. You'd also need to have a hash->content mapping. How this is done isn't specified by the Record
This kind of setup makes the Record completely decoupled from rest of the "stack". It becomes much more of independent moveable "file" (in the original sense that you have at the top) than the interconnected setup you end up with at the end.
- How do you rotate keys? In AT, the user updates the identity document. That doesn't break their old identity or links.
- When you have a link, how do you find its content? In AT, the URL has identity, which resolves to hosting, which you can ask for stuff.
- When aggregating, how do you find all records an application can understand? E.g. how would Bluesky keep track of "Bluesky posts". Does it validate every record just in case? Is there some convention or grouping?
Btw, you might enjoy https://nostr.com/, it seems closer to what you're describing!
1. It's an important problem, but I think this just isn't done at the Record layer. Nor can you? You'd probably want to do that on the person-key->username service (which would have some log-in and way to tie two keys to one username)
2. In a sense that's also not something you think about at the Record level either. It'd be at a different layer of the stack. I'll be honest, I haven't wrapped my head entirely around `did:plc`, but I don't see why you couldn't have essentially the same behavior, but instead of having these unique DID IDs, you'd just use public keys here. pub-key -> DID magic stuff.. and then the rest you can do the same as AT. Or more simply, the server that finds the hashed content uses attached meta-data (like the author) to narrow the search
Maybe there is a good reason the identity `did:plc` layer needs to be baked in to the Record, but I didn't catch it from the post. I'd be curious to here why you feel it needs to be there?
3. I'm not 100% sure I understand the challenge here. If you have a soup of records, you can filter your records based on the type. You can validate them as they arrive. You send your records to the Bluesky server and they validates them as they arrive.
2. The point of the PLC is to avoid tying identity to keys, specifically for the point that if you lose your keys, you lose your identity. In reality, no body wants that as part of the system
3. The soup means you need to index everything. There is no Bluesky server to send things to, only your PDS. Your DID is how I know what PDS to talk to to get your records
We can have both a directory and use content addressable storage and give people the option of using their own keypairs. They are not mutually exclusive. Bluesky chooses to have a central directory and index.
It seems like the biggest downside of this world is iteration speed.
If the AT instagram wants to add a new feature (i.e posts now support video!) then can they easily update their "file format"? How do they update it in a way that is compatible with every other company who depends on the same format, without the underlying record becoming a mess?
Adding new features is usually not a problem because you can always add optional fields and extend open unions. So, you just change `media: Link | Picture | unknown` to `media: Link | Picture | Video | unknown`.
You can't remove things true, so records do get some deprecated fields.
Re: updating safely, the rule is that you can't change which records it would consider valid after it gets used in the wild. So you can't change whether some field is optional or required, you can only add new optional fields. The https://github.com/bluesky-social/goat tool has a linting command that instantly checks whether your changes pass the rules. In general it would be nice if lexicon tooling matures a bit, but I think with time it should get really good because there's explicit information the tooling can use.
If you have to make a breaking change, you can make a new Lexicon. It doesn't have to cause tech debt because you can make all your code deal with a new version, and convert it during ingestion.
Most apps reading records will validate a record against the schema for that type. e.g. there's nothing stopping you from making a app.bsky.feed.post record with more than 300 graphemes in the "text" field, but that post won't appear in the "official" app/website because it fails schema validation.
Similarly, there's nothing stopping you from adding another field in your post. It'll just get ignored because the app you're using doesn't know about it. e.g. posts bridged from mastodon by bridgy have an extra field containing the full original post's text, which you can display in your app if desired. reddwarf.app does this with these posts.
Lexicon validation works the same way. The com.tumblr in com.tumblr.post signals who designed the lexicon, but the records themselves could have been created by any app at all. This is why apps always treat records as untrusted input, similar to POST request bodies. When you generate type definitions from a lexicon, you also get a function that will do the validation for you. If some record passes the check, great—you get a typed object. If not, fine, ignore that record.
I know this is somewhat covered in another comment, but, the concepts described in the post could have been reduced quite a bit, no offense Dan. While I like the writing generally, I would consider writing and then letting it sit for a few days, rereading, and then cutting chaff (editing). This feels like a great first draft but without feedback, and could have greatly benefited from an editing process, and I think using the argument that you want to put out something for others to take and refine isn’t really a strong one… a bit more time and refinement could have made a big difference here (and given you have a decently sized audience I would keep in mind).
From my perspective, there is no chaff. I've already the read the entire thing from top to bottom over 20 times (as I usually do with my writing), I've done several full edit passes, and I've removed everything inessential that I could find. The rest is what I wanted to be included into this article.
I know my style is verbose but I try to include enough details to substantiate the argument at the level that I feel confident it fully stands for itself. If others find something useful in it, I trust that they can riff on those bits or simplify.
For the commenters who have made similar comments, I'd be curious to hear what they think could be cut. I suspect different readers will have different opinions on this, which means it's probably a good thing you didn't make cuts.
I'd say some of the worldview is shared but the architecture and ethos is very different. Some major differences:
- AT tries to solve aggregation of public data first. I.e. it has to be able to express modern social media. Bluesky is a proof that it would work in production. AFAIK, Solid doesn't try to solve aggregation, and is focused on private data first. (AT plans private data support but not now.)
- AT embraces "apps describe on their own formats" (Lexicons). Solid uses RDF which is a very different model. My impression is RDF may be more powerful but is a lot more abstract. Lexicon is more or less like *.d.ts for JSON.
The more I read and consider Bluesky and this protocol, the more pointless -- and perhaps DANGEROUS -- I find the idea.
It really feels like no one is addressing the elephant in the room of; okay, someone who makes something like this is interested in "decentralized" or otherwise bottom-up ish levels of control.
Good goal. But then, when you build something like this, you're actually helping build a perfect decentralized surveillance record.
This why I say that most of Mastodon's limitations and bugs in this regard (by leaving everything to the "servers") are actually features. The ability to forget and delete et al is actually important, and this makes that HARDER.
I'm just kind of like, JUST DO MASTODONS MODEL, like email. It's better and the kinks are more well thought about and/or solved.
Author here. I think it's fair to say that AT protocol's model is "everyone is a scraper", including first party. Which has both bad and good. I share your concern here. For myself, I like the clarity of "treat everything you post as scraped" over "maybe someone is scraping but maybe not" security by obscurity. I also like that there is a way for me to at least guarantee that if I intentionally make something public, it doesn't get captured by the container I posted it into.
This seems like tensions between normal/practical and “opsec” style privacy thinking… Really, we can never be sure anything that gets posted on the internet won’t be captured by somebody outside our control. So, if we want to be full paranoid, we should act like it will be.
But practically lots of people have spent a long time posting their opinions carelessly on the internet. Just protected by the fact that nobody really has (or had) space to back up every post or time to look at them too carefully. The former has probably not been the case for a long time (hard drives are cheap), and the latter is possibly not true anymore in the LLM era.
To some extent maybe we should be acting like everything is being put into a perfect distributed record. Then, the fact that one actually exists should serve as a good reminder of how we ought to think of our communications, right?
Exactly. Anything that's ever been public on the internet is never really gone anyways, and it's unsafe to assume so. This is similar to publishing a website or a blog post. Plus, from a practical (non-opsec) point of view, you can delete items (posts, likes, reposts, etc.) on ATProto, and those items will disappear from whatever ATProto app you are using - usually even live. You need to dive into the protocol layer to still see deleted items.
Your last point is one that I used to be very strongly favor of, and today?
Nooooooooooo. No. No. No.
It's not going to happen and we shouldn't even consider it. Seriously. This thing we are doing here, which is "connecting people to each other," those forces for MANY will be far more powerful than "let me stop and think about the fact that this is forever." I just don't think we are wired for it, we're wired for a world in which we can just talk?
I think it's better to try to engineer some specific counter to "everything is recorded all the time" (or, as in here, not try to usher it into existence even more) than to try to say "welp, everything is recorded all the time, better get used to it."
It would be nice to engineer a way around this, but I don’t see it. Fundamentally if we want to be able to talk to random people, we’ll have to expect that some might be capturing communications, right?
It's true that Mastodon is somewhat better if you don't want to be found, though it's hardly a guarantee. From a "seeing like a state" perspective, Bluesky is more "legible" and that has downsides.
But I think there's room for both models. There are upsides to more legibility too. Sometimes we want to be found. Sometimes we're even engaging in self-promotion.
Also, I'll point out that Hacker News is also very legible. Everything is immutable after the first hour and you can download it. We just live with it.
It's not about the access, it's about the completeness. Imagine this paradigm takes off (I hope it does!), everyone has their own PDS and finally owns their data. Social apps link into their PDS to publish and share data exactly as they're supposed to.
Well now someone's PDS is a truly complete record of their social activity neatly organized for anyone that's
interested. It's not a security issue, after all the data was still public before, but the barrier to entry is now zero. It's so low that you can just go to stalker.io, put in their handle, and it will analyze their profile and will print out a scary accurate timeline of their activity and location leveraging AI's geoguesser skill.
If that's your threat model, then I think the way forward is to maintain separate identities. There are trade-offs there also of course: fragment yourself too much and the people who trust you will now only trust a portion of what you have to say... unless you have the time and energy to rebuild that trust multiple times.
Of course that's the same with the web we have today, the only difference is that you get control over which data goes with which identity rather than having that decision made for you by the platform boundaries.
That is how it works, but people shouldn't be posting their location or sensitive information publicly if they don't want it exposed like that. That's basic opsec. Private data is currently being worked on for ATProto and will hopefully begin existing in 2026.
> people shouldn't be posting their location or sensitive information publicly if they don't want it exposed like that
They shouldn't, but they still could: accidentally paste in the wrong browser tab; have been stupid when they were 12 years old; have gotten drunk; or a number of other things.
In theory it should be possible to allow users to upload ciphertext that they can then share a decryption key with their intended audience. I believe atproto has dissuaded against this with the argument that ciphertext shouldn't be in public view, but this seems to hinge on the idea that the cipher is insecure, or will be in the future. I don't see why using a post-quantum encryption scheme shouldn't provide the appropriate security, which may still not be foolproof, but it certainly would make indexing the data much more difficult
This is a line of thinking that just supposes we shouldn’t post things on the internet at all. Which, sure, is probably the right move if you’re that concerned about OPSEC, but just because ActivityPub has a flakier model doesn’t mean it isn’t being watched
When it comes to the internet, tech is law. There is no way to publicly share something and maintain control over it. Even on the Fediverse, if either a client or server wants to ignore part of the protocol or model, it can. Like a system message to delete particular posts for anti-surveillance reasons can simply be ignored by any servers or clients that were designed/modified for surveillance. Ultimately the buck lies with the owner of some given data to not share that data in the first place if there's a chance of misuse.
agree! Social-media contributions as files on your system: owned by you, served to the app. Like .svg specifications allows editing in inkscape or illustrator a post on my computer would be portable on mastodon or bluesky or a fully distributed p2p network.
AT Proto seems very overengineered. We already have websites with RSS feeds, which more or less covers the publishing end in a way far more distributed and reliable than what AT offers. Then all you need is a kind of indexer to provide people with notifications and discovery and you're done. But I suppose you can't sell that to shareholders because real decentralised technology probably isn't going to turn as much of a profit as a Twitter knockoff with a vague decentralised vibe to it that most users don't understand or care about.
Why so much cynicism? The people working there genuinely care about this stuff. Maybe you disagree with technical decisions but why start by projecting your fantasies about their motivations?
RSS is OK for what it does, but it isn't realtime, isn't signed, and doesn't support arbitrary structured data. Whereas AT is signed, works with any application-defined data structures, and lets you aggregate over millions of users in real time with subsecond end-to-end latency.
Yes, which is why by default, key management is done by your hosting. You log into your host with login/password or whatever mechanism your host supports.
Adding your own emergency rotational key in case your hosting goes rogue is supported, but is a separate thing and not required for normal usage. I'd like this to be more ergonomical though.
As someone who explicitly designed social protocols since 2011, who met Tim Berners-Lee and his team when they were building SOLID (before he left MIT and got funded to turn it into a for-profit Inrupt) I can tell you that files are NOT the best approach. (And neither is SPARQL by the way, Tim :) SOLID was publishing ACLs for example as web resources. Presumably you’d manage all this with CalDAV-type semantics.
But one good thing did come out of that effort. Dmitri Zagidulin, the chief architect on the team, worked hard at the W3C to get departments together to create the DID standard (decentralized IDs) which were then used in everything from Sidetree Protocol (thanks Dan Buchner for spearheading that) to Jack Dorsey’s “Web5”.
Having said all this… what protocol is better for social? Feeds. Who owns the feeds? Well that depends on what politics you want. Think dat / hypercore / holepunch (same thing). SLEEP protocol is used in that ecosystem to sync feeds. Or remember scuttlebutt? Stuff like that.
Multi-writer feeds were hard to do and abandoned in hypercore but you can layer them on top of single-writer. That’s where you get info join ownership and consensus.
ps: Dan, if you read this, visit my profile and reach out. I would love to have a discussion, either privately or publicly, about these protocols. I am a huge believer in decentralized social networking and build systems that reach millions of community leaders in over 100 countries. Most people don’t know who I am and I’m happy w that. Occasionally I have people on my channel to discuss distributed social networking and its implications. Here are a few:
To be clear, I'm using files in a relatively loose sense to focus on the "apps : formats are many-to-many" angle. AT does not literally implement a full filesystem. As the article progresses, I restrict some freedoms in the metaphor (no directories except collections, everything is JSON, etc). If you're interested in the actual low-level repository format, it is described here: https://atproto.com/specs/repository
yeah yeah yeah, everyone get on the AT protocol, so that the bluesky org can quickly get all of these filthy users off of their own servers (which costs money) while still maintaining the original, largest, and currently only portal to actually publish the content (which makes money[0]). let them profit from a technical "innovation" that is 6 levels of indirection to mimic activity pub.
if they were decent people, that would be one thing. but if they're going to be poisoned with the same faux-libertarian horseshit that strangled twitter, I don't see any value in supporting their protocol. there's always another protocol.
but assuming I was willing to play ball and support this protocol, they STILL haven't solved the actual problem that no one else is solving either: your data exists somewhere else. until there's a server that I can bring home and plug in with setup I can do using my TV's remote, you're not going to be able to move most people to "private" data storage. you're just going to change which massive organization is exploiting them.
I know, I know: hardware is a bitch and the type of device I'm even pitching seems like a costly boondoggle. but that's the business, and if you're not addressing it, you're not fomenting real change; you're patting yourself on the back for pretending we can algorithm ourselves out of late-stage capitalism.
>that the bluesky org can quickly get all of these filthy users off of their own servers (which costs money)
That's not correct, actually hosting user data is cheap. Most users' repos are tiny. Bluesky doesn't save anything by having someone move to their own PDS.
What's expensive is stuff like video processing and large scale aggregation. Which has to be done regardless of where the user is hosting their data.
come on, man, let's be real. you're talking modern, practical application; I'm talking reasonable user buy in at big boy social media levels. the video hosting IS what I'm talking about being expensive. you think bsky is going to be successfully while ignoring the instagram crowd forever? what are we doing here?
bsky saves the video processing and bandwidth by not hosting that content on bsky. it's a smaller problem, but in a large enough pool, images become heavy, too. and, either way, the egress of that content is expensive if you're doing it for the entire world, instead of letting each individual's computer (their pds) do it.
I'm happy to admit that text is cheap and bsky isn't looking to offload their data as it stands now. but let's be honest about the long term, which is what my original comment takes aim at.
I still don't think this is correct. The Bluesky app always processes video, whether you're self-hosting or not. The personal data serves stores the original blob, but the Bluesky's video service will have to pick it up and transcode it (to serve it from CDN) either way.
Also, this:
>let them profit from a technical "innovation" that is 6 levels of indirection to mimic activity pub.
is also wrong because AT solves completely different problems. None of the stuff I wrote about in the post can be solved or or is being solved by ActivityPub. Like, AP is just message passing. It doesn't help with aggregation, identity, signing, linking cross-application data, any of that.
right on, man, that must be the only way transcoding can be done and completely future proof so it will never change to let the user transcode their own damn content. I get it; you're frustrated so you're nitpicking. funny how dang doesn't swoop in to tut tut you for not steelmanning instead of strawmanning.
in any case, you're completely right about the activity pub comment. that was absolutely mockery and not actually a complaint. artistic license and all that. god forbid we express ourselves. but sure, I can recognize that AT proto is useful in that it provides mechanisms we didn't really have before. that said, it's not novel (just new) and it's not irreplaceable. LIKE I SAID: there's always another protocol.
any time you want to actually address the point of the comments, I'd be happy to get your take on why it's fine, actually, for the CEO to imply she doesn't have to care what users think as long as they aren't paying her. but if you're not ready to have the real conversation, I'll let you be satisfied with whatever other potshots you want to take at my reasonable indignation.
> until there's a server that I can bring home and plug in with setup I can do using my TV's remote, you're not going to be able to move most people to "private" data storage
Quite some BSky users are publishing on their own PDS (Personal Data Server) right now. They have been for a while. There are already projects that automate moving or backign up your PDS data from BSky, like https://pdsmoover.com/
Microblogging is also the least interesting part of the ATProto ecosystem. I've switched all my git hosting over to https://tangled.org and am loving it, not least of which is that my git server (a 'knot' in Tangled parlance) is under my control as a PDS and has no storage limits!
yeah, tangled seems like a pretty well-designed piece of tech. I've never used it, myself, but I did an audit and found that it's not only analogous to github as far as UX, but it also includes features like CI/CD, which other public/social repo servers have struggled with.
only reason I backed away from it is that when the bsky team had a big "fuck the users" moment, the user purporting to be the tangled founder was happy to cheer them on. so between having to use AT proto, and assuming that the tangled dev doesn't really disagree with bsky's "fuck the users" sentiment, I moved on. but, obviously, whiny moral grandstanding is irrelevant to whether or not someone made a good product. if you've got a use for it, I'd certainly recommend giving it a try!
Tangled founder here; it's just as easy! For example, here's the entire Tangled codebase monorepo: https://tangled.org/tangled.org/core — you can clone this directly as you would a git repo anywhere else.
New user sign up is a bit wonky. It asked for an email, login and password, then it's asking for a bsky sign-in too? This seems a little weird.
(Minor nit: for some reason, Google didn't auto-suggest a strong password for the password field.)
Then I got to the screen where it asks for full read-write access to my PDS and stopped there. It's kind of a lot to ask! I believe this is Bluesky's fault, but I don't think I can really use third-party bluesky apps until they implement finer-grained permissions.
yeah, I was one of them. developers are not the endgame, though. true social media needs people who are not going to do anything more complicated than "go to website, sign up". there's no world where setting up your own pds is that simple without an organized piece of software to do that kind of thing.
personally, I could probably get behind recommending something like umbrel[0], if it included something like a "include a pds" option during config. but even that is asking for a lot of mind-share for a non-tech user. it would take a super smooth setup process for that to be realistic. point is, though, I'm not saying it can't be done; I'm saying no one is doing it and what people are doing is not getting the job done for wider adoption.
[0] https://umbrel.com/
*and, naturally, at this point, I'd prefer they include something that isn't based on AT proto for social publication. I wouldn't mind if they had both, but just an AT proto implementation wouldn't attract me.
> When great thinkers think about problems, they start to see patterns. They look at the problem of people sending each other word-processor files, and then they look at the problem of people sending each other spreadsheets, and they realize that there’s a general pattern: sending files. That’s one level of abstraction already. Then they go up one more level: people send files, but web browsers also “send” requests for web pages. And when you think about it, calling a method on an object is like sending a message to an object! It’s the same thing again! Those are all sending operations, so our clever thinker invents a new, higher, broader abstraction called messaging, but now it’s getting really vague and nobody really knows what they’re talking about any more.
Hi, so I generally actually agree with you and your criticisms of this blog post (in your thread with the author). I think there's something pretty true in the blog post you shared from Joel (true in that it applies to more than just the software world) and looked at some of his more recent posts.
This one in particular reads similar to what this comment section is about, it looks like Joel is basically becoming an architecture astronaut himself? Not sure if that's actually an accurate understanding of what his "block protocol" is, but I'm curious to hear from you what you think of that? In the 25 years since that post, has he basically become the thing he once criticized, and is that the result of just becoming a more and more senior/thinker within the industry?
Author here! I grew up reading Joel's blog and am familiar with this post. Do you have a more pointed criticism?
I agree something like "hyperlinked JSON" maybe sounds too abstract, but so does "hyperlinked HTML". But I doubt you see web as being vague? This is basically web for data.
After taking the time to re-read the article since I initial posted my (admittedly shallow) dismissal, I realized this article is really a primer/explainer for the AT protocol, which I don't really have enough background in to criticize.
My criticism is more about the usefulness of saying "what if we treated social networking as a filesystem": which is that this doesn't actually solve any problems or add any value. The idea of modeling a useful thing (social media)[0] as a filesystem is generalizing the not-useful parts of it (ie. the minutia of how you actually read/write to it) and not actually addressing any of the interesting or difficult parts of it (how you come up with relevant things to look at, whether a "feed" should be a list of people you follow or suggestions from an algorithm, how you deal with bad actors, sock puppets, the list goes on forever.)
This is relevant to Joel's blog because of the point he makes about Napster: It was never about the "peer to peer" or "sharing", that was the least interesting part. The useful thing about Napster was that you could type in a song and download it. It would have been popular if it wasn't peer to peer, so long as you could still get any music you wanted for free.
Modeling social media as a filesystem, or constructing a data model about how to link things together, and hypergeneralizing all the way to "here's how to model any graph of data on the filesystem!" is basically a "huh, that's neat" little tech demo but doesn't actually solve anything. Yes, you can take any graph-like structured data and treat it as files and folders. I can write a FUSE filesystem to browse HN. I can spend the 20 minutes noodling on how the schema should work, what a "symlink" should represent, etc... but at the end of the day, you've just taken data and changed how it's presented.
There's no reason for the filesystem to be the "blessed" metaphor here. Why not a SQL database? You can `SELECT * FROM posts WHERE like_count > 100`, how neat! Or how about a git repo? You can represent posts as commits, and each person's timeline as a branch, and ooh then you could cherry-pick to retweet!
These kind of exercises basically just turn into nerd-sniping: You think of a clever "what if we treated X as Y" abstraction, then before you really stop to think "what problem does that actually solve", you get sucked into thinking about various implementation details and how it to model things.
The AT protocol may be well-designed, it may not be, but my point is more that it's not protocols that we're lacking. It's a lack of trust, lack of protection from bad actors, financial incentives that actively harm the experience for users, and the negative effects on what social media does to people. Nobody's really solved any of this: Not ActivityPub, not Mastadon, not BlueSky, not anyone. Creating a protocol that generalizes all of social media so that you can now treat it all homogeneously is "neat", but it doesn't solve anything that you couldn't solve via a simple (for example) web browser extension that aggregated the data in the same way for you. Or bespoke data transformations between social media sites to allow for federation/replication. You can just write some code to read from site A and represent it in site B (assuming sites A and B are willing.) Creating a protocol for this? Meh, it's not a terrible idea but it's also not interesting.
- [0] You could argue whether social media is "useful", let's just stipulate that it is.
I think there was a bit of a communication failure between us. You took the article as a random "what if X was Y" exploration. However, what I tried to communicate something more like:
1. File-first paradigm has some valuable properties. One property is apps can't lock data out of each other. So the user can always change which apps they use.
2. Web social app paradigm doesn't have these properties. And we observe the corresponding problems: we're collectively stuck with specific apps. This is because our data lives inside those apps rather than saved somewhere under our control.
3. The question: Is there a way to add properties of the file-first paradigms (data lives outside apps) to web social apps? And if it is indeed possible, does this actually solve the problems we currently have?
The rest of the article explores this (with AT protocol being a candidate solution that attempts to square exactly this problem). I'm claiming that:
1. Yes, it is possible to add file-first paradigm properties to web social apps
2. That is what AT protocol does (by externalizing data and adding mechanisms for aggregation from user-controlled source of truth)
3. Yes, this does solve the original stated problems — we can see in demos from the last section that data doesn't get trapped in apps, and that developers can interoperate with zero coordination. And that it's already happening, it's not some theoretical thing.
I don't understand your proposed alternative with web extension but I suspect you're thinking about solving some other problems than I'm describing.
Overall I agree that I sacrificed some "but why" in this article to focus on "here's how". For a more "but why" article about the same thing, you might be curious to look at https://overreacted.io/open-social/.
The problems with social media are not at all the fact that things are “locked up in apps”.
Again, you missed my point. Data sharing is the least interesting thing imaginable, has already been solved countless times, and is not the reason social media sinks or swims.
Social media sinks or swims based on one thing and one thing only: is it enjoyable to use. Are all the people on here assholes or do they have something interesting to say? Can I post something without being overrun by trolls? How good are the moderation standards? How do I know if the people posting aren’t just AI bots? What are the community standards? In short: what kind of interactions can I expect to have on the platform?
The astronaut types look at the abysmal landscape social media has become, and think “you know what the fundamental problem is? That all this is locked up in apps! Let’s make a protocol, that’ll fix it!”
Never mind that the profit seeking platforms have zero interest in opening up their API to competing sites. Never mind that any of the sites that are interested in openness/federating all univerally have no answer to the problem of how you address content moderation, or at least nothing that’s any different from what we’ve seen before.
The problem in social media is not that things are locked up behind an app. There are apps/readers that combine multiple platforms for me (I remember apps that consolidated Facebook and twitter fully eighteen years ago. It’s not hard.)
The problem with social media is that it’s a wasteland full of bots and assholes.
A HN poster said it best 8 years ago about twitter, and I think it applies to all of social media: it’s a planetary scale hate machine: https://news.ycombinator.com/item?id=16501147
I actually agree with you on a lot of these things, I just think that they do relate to the technological shape.
To give you an example, Blacksky is in setting up their alternative server that is effectively forking the product, which gives them ability to make different moderation decisions (they've restored the account of a user that is banned from Bluesky: https://bsky.app/profile/rude1.blacksky.team/post/3mcozwdhjo...).
However, unlike Mastodon and such, anyone on the Blacksky server will continue living in the same "world" as the Bluesky users, it's effectively just a different "filter" on the global data.
Before AT, it was not possible to do that.
You couldn't "fork" moderation of an existing product. If you wanted different rules, you had to create an entire social network from scratch. All past data used to stay within the original product.
AT enables anyone motivated to spin up a whole new product that works with existing data, and to make different decisions on the product level about all of the things you mentioned people care about. How algorithms run, how moderation runs, what the standards are, what the platform affordances are.
What AT creates here is competition because normally you can't compete until you convince everyone to move. Whereas with AT, everybody is always "already there" so you can create or pick the best-run prism over the global data.
Does this make more sense? It's all in service of the things you're talking about. We just need to make it possible to try different things without always starting from scratch.
I was hoping this was literally just going to be some safe version of a BBS/Usenet sort of filesharing that was peer-based king of like torrents, but just simple and straightforward, with no porn, infected warez, randomware, crypto-mining, racist/terrorist/nazi/maga/communist/etc. crap, where I could just find old computing magazines, homebrew games, recipes, and things like that.
yes: https://www.swyx.io/data-outlasts-code-but
all lasting work is done in files/data (can be parsed permissionlessly, still useful if partially corrupted), but economic incentives keep pushing us to keep things in code (brittle, dies basically when one of maintainer|buildtools|hardware substrate dies).
when standards emerge (forcing code to accept/emit data) that is worth so much to a civilization. a developer ecosystem tipping the incentive scales such that companies like the Googl/Msft/OpenAI/Anthropics of the world WANT to contribute/participate in data standards rather than keep things proprietary is one of the most powerful levers we as a developer community collectively hold.
(At the same time we shoudl also watch out for companies extending/embracing/extinguishing standards... although honestly outside of Chrome I struggle to think of a truly successful example)
> Files are the source of truth—the apps would reflect whatever’s in your folder.
Now that the "app" is a web site that supports itself with advertising revenue, it has no incentive whatsoever to work this way.
I am working on a similar model for commerce. Sellers deploy their own commerce logic such as orders, carts, and payments as a hosted service they control, and marketplaces integrate directly with seller APIs rather than hosting sellers. This removes platform overhead, lowers fees, and shifts ownership back to the people creating value, turning marketplaces into interoperable discovery layers instead of gatekeepers.
https://openship.org/about
I'll add that, like a web server that's just serving up static files, a Bluesky PDS is a public filesystem. Furthermore it's designed to be replicated, like a Git repo. Replicating the data is an inherent part of how Bluesky works. Replication is out of your control. On the bright side, it's an automatic backup.
So, much like with a public git repo, you should be comfortable with the fact that anything you put there is public and will get indexed. Random people could find it in a search. Inevitably, AI will train on it. I believe you can delete stuff from your own PDS but it's effectively on your permanent record. That's just part of the deal.
So, try not to put anything there that you'll regret. The best you could do is pick an alias not associated with your real name and try to use good opsec, but that's perilous.
>Someone should write a user-friendly file browser for PDS's so you can see it for yourself.
You can skip to the end of the article where I do a few demos: https://overreacted.io/a-social-filesystem/#up-in-the-atmosp.... I suggest a file manager there:
>Open https://pdsls.dev. [...] It’s really like an old school file manager, except for the social stuff.
And yes, the paradigm is essentially "everyone is a scraper".
https://pdsls.dev/ can serve this purpose IMO :) it's a pretty neat app, open source, and is totally client-side
edit: whoops, pdsls is already mentioned at the end of the article
0: https://tangled.org/oppi.li/pdsfs
whats the sota on atproto encryption dan? just publish encrypted stuff with sha 256 and thats it?
The app lives on a single OpenBSD server. All user data is stored in /srv/app/[user]. Authentication is done by accessing OpenBSD Auth helper functions.
Users can access their data through the UI normally. Or they can use a web based filesystem browser to edit their data files. Or, alternately, they can ssh into the server and have full access to their files with all the advantages this entails. Hopefully, this raises the ceiling a bit for what power users of the system can accomplish.
I wanted to unify the OS ecosystem and the web app ecosystem and play around with the idea of what happens if those things aren't separate. I'm sure I'm introducing all kinds of security concerns which I'm not currently aware of.
Another commenter brought up Perkeep, which I think is very interesting. Even though I love Plan 9 conceptually, I do sort of wonder if "everything is a file" was a bit of a wrong turn. If I had my druthers, I think building on top of an OS which had DB and blob storage as the primary concept would be interesting and perhaps better.
If anybody cares, it's POOh stack, Postgres, OCAML, OpenBSD, AND htmx
It is possible to create links asserting a specific version by making a "strong ref" which includes content hash.
Also, users don't really want apps. What users want are capabilities. So not Bluesky, or YouTube for example, but the capability to easily share a life update with interested parties, or the capability to access yoga tutorial videos. The primary issue with apps is that they bundle capabilities, but many times particular combinations of capabilities are desired, which would do well to be wired together.
Something in particular that's been popping up fairly often for me is I'm in a messaging app, and I'd like to lookup certain words in some of the messages, then perhaps share something relevant from it. Currently I have to copy those words over to a browser app for that lookup, then copy content and/or URL and return to the messaging app to share. What I'd really love is the capability to do lookups in the same window that I'm chatting with others. Like it'd be awesome if I could embed browser controls alongside the message bubbles with the lookup material, and optionally make some of those controls directly accessible to the other part(y|ies), which may even potentially lead to some kind of adhoc content collaboration as they make their own updates.
It's time to break down all these barriers that keep us from creating personalized workflows on demand. Both at the intra-device level where apps dominate, and at the inter-device level where API'd services do.
[0] https://perkeep.org/
I picked this metaphor because "apps" are many-to-many to "file formats". I found "file format" to be a very powerful analogy for lexicons so I kind of built everything else in the explanation around that.
You can read https://atproto.com/specs/repository for more technical details about the repository data structure:
The repository data structure is content-addressed (a Merkle-tree), and every mutation of repository contents (eg, addition, removal, and updates to records) results in a new commit data hash value (CID). Commits are cryptographically signed, with rotatable signing keys, which allows recursive validation of content as a whole or in part. Repositories and their contents are canonically stored in binary DAG-CBOR format, as a graph of data objects referencing each other by content hash (CID Links). Large binary blobs are not stored directly in repositories, though they are referenced by hash (CID).
Re: apps, I'd say AT is actually post-app to some extent because Lexicons aren't 1:1 to apps. You can share Lexicons between apps and I totally can see a future where the boundaries are blurring and it's something closer to what you're describing.
The effect of the internet (everything open to everyone) was to create smaller pockets around a specific idea or culture. Just like you have group chats with different people, thats what IG and Snap are. Segmentation all the way down.
I am so happy that my IG posts arent available on my HN or that my IG posts arent being easily cross posted to a service I dont want to use like truth social. If you want it to be open, just post it to the web.
I think I don't really understand the benefit of data portability in the situation. It feels like in crypto when people said I want to use my Pokemon in game item in Counterstrike (or any game) like, how and why would that even be valuable without the context? Same with a Snap post on HN or a HN post on some yet-to-be-created service.
ATProto apps don't automatically work like this and don't support all types of "files" by default. The app's creator has to built support for a specific "file type". My app https://anisota.net supports both Bluesky "files" and Leaflet "files", so my users can see Bluesky posts, Leaflet posts, and Anisota posts. But this is because I've designed it that way.
Anyone can make a frontend that displays the contents of users PDSs.
Here's an example...
Bluesky Post on Bluesky: https://bsky.app/profile/dame.is/post/3m36cqrwfsm24
Bluesky Post on Anisota: https://anisota.net/profile/dame.is/post/3m36cqrwfsm24)
Leaflet post on Leaflet: https://dame.leaflet.pub/3m36ccn5kis2x
Leaflet post on Anisota: https://anisota.net/profile/dame.is/document/3m36ccn5kis2x
I also have a little side project called Aturi that helps provide "universal links" so that you can open ATProto-based content on the client/frontend of your choice: https://aturi.to/anisota.net
(Imo, that is a perverse interpretation, it's about user choice, which they are effectively taking away from me by auto importing and writing to my Bsky graph)
re: the debates on reusing follows from Bluesky in other apps instead of their own
I have all of the raw image files that I've uploaded to Instagram. I can screenshot or download the versions that I created in their editor. Likewise for any text I've published anywhere. I prefer this arrangement, where I have the raw data in my personal filesystem and I (to an extent) choose which projections of it are published where on the internet. An IG follow or HN upvote has zero value to me outside of that platform. I don't feel like I want this stuff aggregated in weird ways that I don't know about.
What AT enables is forking products with their data and users. So, if some product is going down a bad road, a motivated team can fork it with existing content, and you can just start using the new thing while staying interoperable with the old thing. I think this makes the landscape a lot more competitive. I wrote about this in detail in https://overreacted.io/open-social/#closed-social which is another longread but specifically gets into this problem.
I hear you re: not wanting "weird aggregation", that just be a matter of taste. I kind of feel like if I'm posting something on the internet, I might as well have it on the open web as aggregatable by other apps.
For better or worse, Twitter built their network. That many people willingly signed up and posted and continue to post there. I don't think anyone really should be able to fork it, because the users didn't collectively agree to that, and they don't all agree on what a good road or a bad road is. Ultimately, they can choose to leave if and when they want. Are these networks rather sticky, yes of course, but that's life.
We've seen lots of social networks come and go, things do change over time, there's ample opportunity for new ideas to flourish. In that sense, AT is perfectly welcome to throw their hat in the ring and see if that resonates and sticks. If people want their social network to be forkable, that concept will succeed.
I do think it misses what a lot of people find valuable about the tangibility and constraints of "I am making this content specifically for this platform and this audience at this point in time." I don't think most people think of their social media posts as a body of work that they want to maintain and carry over in an abstract sense independent of platform, and give open license to anyone to cook up into whatever form they can dream of.
Just the other week, another service that people actively used called Bento announced shutdown: https://bento.me/. This sucks for the user.
Someone created an alternative called Blento (https://blento.app/) on AT. Of course, by itself, this doesn't mean they'll be successful. But the thing is that, if Blento shuts down, someone can put it right back up because (1) it's open source, and (2) the data is outside Blento. Any new app can kickstart with that data and get people's sites back up and running. And two platforms can even compete on top of the same data.
I agree content is tailed to the platform and resurrecting something doesn't necessarily makes sense. But that's the point of lexicons. You get the choice of what makes sense to resurrect (actively moving to an alternative) vs what doesn't (something with a style that doesn't work elsewhere) vs new recontextualizations we haven't even tried or thought of. I think it's early to dismiss before trying.
Twitter was my home on the web for almost 15 years when it got taken over by a ... - well you know the story. At the time I wished I could have taken my identity, my posts, my likes, and my entire social graph over to a compatible app that was run by decent people. Instead, I had to start completely new. But with ATProto, you can do exactly that - someone else can just fork the entire app, and you can keep your identity, your posts, your likes, your social graph. It all just transfers over, as long as the other app is using the same ATProto lexicon (so it's basically the same kind of app).
For better or worse, I don't think it makes sense to decentralize social. The network of each platform is inherently imbued with the characteristics and culture of that platform.
And I feel like Twitter is the anomalous poster child for this entire line of thinking. Pour one out, let it go, move on, but I don't think creating generalized standards for social media data is the answer. I don't want 7 competing Twitter-like clones for different political ideologies that all replicate each others' data with different opt-in/opt-out semantics. That sounds like hell.
My point is that I don't believe the separation of frontend and data is desirable for a social network. I want to know that I am on a specific platform that gives me some degree of control and guarantee (to the extent that I trust that platform) over how my data is represented. I don't really have to worry that it's showing up in any number of other places that I didn't sign up for (technically I do since everything public can be scraped of course, but in practice there are safeguards that go out the window when you explicitly create something like a PDS).
Generally it's good practice for AT apps to treat you as not signed up if you have not explicitly signed up to this app before. So I think a well-behaved site you never used shouldn't show your profile page on request. If this is right, maybe it needs to be more emphasized in docs, templates, etc. Same as they should correctly handle lifecycle events like account getting deactivated.
Realistically it's possible some services don't do this well but it's not clear to me that this is going to be a major problem. Like if it gets really bad, it seems like either this will get fixed or people will adjust expectations about how profile pages work.
https://anisota.net will display any Bluesky content and profiles without the consent of the user — no one cares at the moment though because either 1) they don't know about such a niche project, or 2) they aren't concerned cause I'm not a controversial figure.
If Truth Social was suddenly a part of the ATmosphere or a part of some other wide network of users, most people would catch on eventually and be hopefully conditioned to realize that the mere presence of someone's content on an app/site doesn't mean they use that app/site
I think what may be confusing you is that Bluesky (the company) acts in two different roles. There's hosting (PDS) and there's an app (bsky.app). You can think of these conceptually as two different services or companies.
Yes, when you sign up on Bluesky, you do get "Bluesky hosting" (PDS). But hosting doesn't know anything about apps. It's more like a Git repo under the hood.
Different apps (Bluesky app is one of them) can then aggregate data from your hosting (wherever it is) and show different projections of it.
Finally, no, if you're famous, you don't need enterprise hosting. Hosting a PDS can be extremely cheap (like $1/mo maybe)? PDS doesn't get traffic spikes on viral content because it's amortized by the app (which serves from its DB).
I actually agree with that. See from the post:
>For some use cases, like cross-site syndication, a standard-ish jointly governed lexicon makes sense. For other cases, you really want the app to be in charge. It’s actually good that different products can disagree about what a post is! Different products, different vibes. We’d want to support that, not to fight it.
AT doesn't make posts from one app appear in all apps by default, or anything like that. It just makes it possible for products to interoperate where that makes sense. It is up to whoever's designing the products to decide which data from the network to show. E.g. HN would have no reason to show Instagram posts. However, if I'm making my own aggregator app, I might want to process HN stuff together with Reddit stuff. AT gives me that ability.
To give you a concrete example where this makes sense. Leaflet (https://leaflet.pub/) is a macroblogging platform, but it ingests Bluesky posts to keep track of quotes from the Leaflets on the network, and display those quotes in a Leaflet's sidebar. This didn't require Leaflet and Bluesky to collaborate, it's just naturally possible.
Another reason to support this is that it allows products to be "forked" when someone is motivated enough. Since data is on the open network, nothing is stopping from a product fork from being perfectly interoperable with the original network (meaning it both sees "original" data and can contribute to it). So the fork doesn't have to solve the "convince everyone to move" problem, it just needs to be good enough to be worth running and growing organically. This makes the space much more competitive. To give an example, Blacksky is a fork of Bluesky that takes different moderation decisions (https://bsky.app/profile/rude1.blacksky.team/post/3mcozwdhjo...) but remains interoperable with the network.
That's just how it works and I accept the risk.
People concerned about that probably shouldn't publish on Bluesky. Private chat makes more sense for a lot of things.
For example, gaze upon dev.ocbwoy3.crack.defs [0] and dev.ocbwoy3.crack.alterego [1]. If you wanted to construct a UI around these, realistically you're gonna need to know wtf you're building (it's a twitter/bluesky clone); there simply isn't enough information in the lexicons to do a good job. And the argument can't be "hey you published a lexicon and now people can assume your data validates", because validation isn't done on write, it's done on read. So like, there really is no difference between this and like, looking up the docs on the data format and building a client. There are no additional guarantees.
Maybe there's an argument for moving towards some kind of standardization, but... do we really need that? Like are we plagued by dozens of slightly incompatible scrobbling data models? Even if we are, isn't this the job of like, an NPM library and not a globally replicated database?
Anyway, I appreciate that, facially, at proto is trying to address lock in. That's not easy, and I like their solution. But I don't think that's anywhere near the biggest problem Twitter had. Just scanning the Bluesky subreddit, there's still problems like too much US politics and too many dick pics. It's good to know that some things just never change I guess.
[0]: https://lexicon.garden/lexicon/did:plc:s7cesz7cr6ybltaryy4me...
[1]: https://lexicon.garden/lexicon/did:plc:s7cesz7cr6ybltaryy4me...
E.g. Blento (https://blento.app/) is atproto Bento that I only saw a couple of days ago. But the cool thing is that if it shuts down, not only someone else can set it up again (it's open source), but they're also gonna be able to render all of the users' existing content. I think that's a meaningful step forward for this use case.
Yes, there's gonna be tons of stuff on the network that's too niche, but then there's no harm in it either. Whereas wherever there is enough interest, someone can step in and provide the code for the data.
I bet more difficult to push existing commercial platforms to anyhow consider.
That would make marketing tools to manage social communications and posting across popular social media, much easier. Never the less Social Marketing tools have already invented we similar analogy just to make control over own content and feedback across instances and networks.
We still live in a world where some would say BSKY some would say Mastodon is the future… while everybody still has facebook and instagram and youngsters tik tok too. Those are closed platforms where only tools to hack them, not standards persist
Github/Gitlab would be a provider of the filesystem.
The problem is app developers like Google want to own your files.
[0]: https://remotestorage.io/
remoteStorage seems aimed at apps that don't aggregate data across users.
AT aims to solve aggregation, which is when many users own their own data, but what you want to display is something computed from many of them. Like social media or even HN itself.
I think of those projects as working relatively well for private data, but public data is kinda awkward. ATProto is the other way around: it has a lot of infra to make public data feasible, but private data is still pretty awkward.
It's a lot more popular though, so maybe has a bigger chance of solving those issues? Alternatively, Bluesky keeps its own extensions for that, and starts walling those bits off more and more as the VCs amp up the pressure. That said, I know very little about Bluesky, so this speculation might all be nonsense.
[1]: https://www.inkandswitch.com/essay/local-first/
[2]: https://humanewebmanifesto.com/
The hard problem IMO is how do you incentivize companies from adopting this since walled gardens helps reduce competition.
Language nitpicking aside... you subvert the walls of their gardens and aggregate the walled-off data without the walls, so users face a choice not between:
- facebook
- everything else
but instead between
- facebook and everything else
- just facebook
But that approach only works if we can solve the "data I created" problems in a way that doesn't also require us to acknowledges facebook's walls.
The whole things seems a bit over engineered with poor separation of concerns.
It feels like it'd be smarter to flatten the design and embed everything in the Records. And then other layers can be built on top of that
Making every record includes the author's public-key (or signature?). Anything you need to point at you'd either just give its hash, or hash + author-public-key. This way you completely eliminate this goofy filesystem hierarchy. Everything else is embed it in the Record.
Lexicons/Collections are just a field in the Record. Reverse looking up the hash to find what it is, also a separate problem.
- Getting the @danabramov username would be done by having some lookup service that does person-key->username. You could have several. Usernames can be changed with the service.. But you can have your own map if you want, or infer it from github commits :)) There are some interesting ideas about usernames about. How this is done isn't specified by the Record
- Lexicon is also done separately. This is some validation step that's either done by a consumer app/editor of the record or by a server which distributes records (could be based on the :type or something else). Such a server can check if you have less than 300 graphemes and reject the record if it fails. How this is done isn't specified by the Record
- Collection.. This I think is just organizational? How this is done isn't specified by the Record. It's just aggregating all records of the same type from the same author I guess?
- Hashes.. they can point at anything. You can point at a webpage or an image or another record (where you can indicate the author). For dynamic content you'd need to point at webpage that points at a static URL which has the dynamic content. You'd also need to have a hash->content mapping. How this is done isn't specified by the Record
This kind of setup makes the Record completely decoupled from rest of the "stack". It becomes much more of independent moveable "file" (in the original sense that you have at the top) than the interconnected setup you end up with at the end.
- How do you rotate keys? In AT, the user updates the identity document. That doesn't break their old identity or links.
- When you have a link, how do you find its content? In AT, the URL has identity, which resolves to hosting, which you can ask for stuff.
- When aggregating, how do you find all records an application can understand? E.g. how would Bluesky keep track of "Bluesky posts". Does it validate every record just in case? Is there some convention or grouping?
Btw, you might enjoy https://nostr.com/, it seems closer to what you're describing!
2. In a sense that's also not something you think about at the Record level either. It'd be at a different layer of the stack. I'll be honest, I haven't wrapped my head entirely around `did:plc`, but I don't see why you couldn't have essentially the same behavior, but instead of having these unique DID IDs, you'd just use public keys here. pub-key -> DID magic stuff.. and then the rest you can do the same as AT. Or more simply, the server that finds the hashed content uses attached meta-data (like the author) to narrow the search
Maybe there is a good reason the identity `did:plc` layer needs to be baked in to the Record, but I didn't catch it from the post. I'd be curious to here why you feel it needs to be there?
3. I'm not 100% sure I understand the challenge here. If you have a soup of records, you can filter your records based on the type. You can validate them as they arrive. You send your records to the Bluesky server and they validates them as they arrive.
3. The soup means you need to index everything. There is no Bluesky server to send things to, only your PDS. Your DID is how I know what PDS to talk to to get your records
If the AT instagram wants to add a new feature (i.e posts now support video!) then can they easily update their "file format"? How do they update it in a way that is compatible with every other company who depends on the same format, without the underlying record becoming a mess?
Adding new features is usually not a problem because you can always add optional fields and extend open unions. So, you just change `media: Link | Picture | unknown` to `media: Link | Picture | Video | unknown`.
You can't remove things true, so records do get some deprecated fields.
Re: updating safely, the rule is that you can't change which records it would consider valid after it gets used in the wild. So you can't change whether some field is optional or required, you can only add new optional fields. The https://github.com/bluesky-social/goat tool has a linting command that instantly checks whether your changes pass the rules. In general it would be nice if lexicon tooling matures a bit, but I think with time it should get really good because there's explicit information the tooling can use.
If you have to make a breaking change, you can make a new Lexicon. It doesn't have to cause tech debt because you can make all your code deal with a new version, and convert it during ingestion.
Similarly, there's nothing stopping you from adding another field in your post. It'll just get ignored because the app you're using doesn't know about it. e.g. posts bridged from mastodon by bridgy have an extra field containing the full original post's text, which you can display in your app if desired. reddwarf.app does this with these posts.
Here's the schema for bluesky posts, if you're interested: https://pdsls.dev/at://did:plc:4v4y5r3lwsbtmsxhile2ljac/com....
Lexicon validation works the same way. The com.tumblr in com.tumblr.post signals who designed the lexicon, but the records themselves could have been created by any app at all. This is why apps always treat records as untrusted input, similar to POST request bodies. When you generate type definitions from a lexicon, you also get a function that will do the validation for you. If some record passes the check, great—you get a typed object. If not, fine, ignore that record.
So, validate on read, just like files.
I know my style is verbose but I try to include enough details to substantiate the argument at the level that I feel confident it fully stands for itself. If others find something useful in it, I trust that they can riff on those bits or simplify.
It's like saying this MR could use some work but not citing a specific example.
https://solidproject.org/
- AT tries to solve aggregation of public data first. I.e. it has to be able to express modern social media. Bluesky is a proof that it would work in production. AFAIK, Solid doesn't try to solve aggregation, and is focused on private data first. (AT plans private data support but not now.)
- AT embraces "apps describe on their own formats" (Lexicons). Solid uses RDF which is a very different model. My impression is RDF may be more powerful but is a lot more abstract. Lexicon is more or less like *.d.ts for JSON.
It really feels like no one is addressing the elephant in the room of; okay, someone who makes something like this is interested in "decentralized" or otherwise bottom-up ish levels of control.
Good goal. But then, when you build something like this, you're actually helping build a perfect decentralized surveillance record.
This why I say that most of Mastodon's limitations and bugs in this regard (by leaving everything to the "servers") are actually features. The ability to forget and delete et al is actually important, and this makes that HARDER.
I'm just kind of like, JUST DO MASTODONS MODEL, like email. It's better and the kinks are more well thought about and/or solved.
But practically lots of people have spent a long time posting their opinions carelessly on the internet. Just protected by the fact that nobody really has (or had) space to back up every post or time to look at them too carefully. The former has probably not been the case for a long time (hard drives are cheap), and the latter is possibly not true anymore in the LLM era.
To some extent maybe we should be acting like everything is being put into a perfect distributed record. Then, the fact that one actually exists should serve as a good reminder of how we ought to think of our communications, right?
Nooooooooooo. No. No. No.
It's not going to happen and we shouldn't even consider it. Seriously. This thing we are doing here, which is "connecting people to each other," those forces for MANY will be far more powerful than "let me stop and think about the fact that this is forever." I just don't think we are wired for it, we're wired for a world in which we can just talk?
I think it's better to try to engineer some specific counter to "everything is recorded all the time" (or, as in here, not try to usher it into existence even more) than to try to say "welp, everything is recorded all the time, better get used to it."
But I think there's room for both models. There are upsides to more legibility too. Sometimes we want to be found. Sometimes we're even engaging in self-promotion.
Also, I'll point out that Hacker News is also very legible. Everything is immutable after the first hour and you can download it. We just live with it.
a record of what? Posts I wish to share with the public anyway?
Well now someone's PDS is a truly complete record of their social activity neatly organized for anyone that's interested. It's not a security issue, after all the data was still public before, but the barrier to entry is now zero. It's so low that you can just go to stalker.io, put in their handle, and it will analyze their profile and will print out a scary accurate timeline of their activity and location leveraging AI's geoguesser skill.
Of course that's the same with the web we have today, the only difference is that you get control over which data goes with which identity rather than having that decision made for you by the platform boundaries.
They shouldn't, but they still could: accidentally paste in the wrong browser tab; have been stupid when they were 12 years old; have gotten drunk; or a number of other things.
RSS is OK for what it does, but it isn't realtime, isn't signed, and doesn't support arbitrary structured data. Whereas AT is signed, works with any application-defined data structures, and lets you aggregate over millions of users in real time with subsecond end-to-end latency.
Adding your own emergency rotational key in case your hosting goes rogue is supported, but is a separate thing and not required for normal usage. I'd like this to be more ergonomical though.
That said it’s a very elegant way to describe AT protocol.
But one good thing did come out of that effort. Dmitri Zagidulin, the chief architect on the team, worked hard at the W3C to get departments together to create the DID standard (decentralized IDs) which were then used in everything from Sidetree Protocol (thanks Dan Buchner for spearheading that) to Jack Dorsey’s “Web5”.
Having said all this… what protocol is better for social? Feeds. Who owns the feeds? Well that depends on what politics you want. Think dat / hypercore / holepunch (same thing). SLEEP protocol is used in that ecosystem to sync feeds. Or remember scuttlebutt? Stuff like that.
Multi-writer feeds were hard to do and abandoned in hypercore but you can layer them on top of single-writer. That’s where you get info join ownership and consensus.
ps: Dan, if you read this, visit my profile and reach out. I would love to have a discussion, either privately or publicly, about these protocols. I am a huge believer in decentralized social networking and build systems that reach millions of community leaders in over 100 countries. Most people don’t know who I am and I’m happy w that. Occasionally I have people on my channel to discuss distributed social networking and its implications. Here are a few:
Ian Clarke, founder of Freenet, probably the first decentralized (not just federated) social network: https://www.youtube.com/watch?v=JWrRqUkJpMQ
Noam Chomsky, about Free Speech and Capitalism (met him same day I met TimBL at MIT) https://www.youtube.com/watch?v=gv5mI6ClPGc
Patri Friedman, grandson of Milton Friedman on freedom of speech and online networks https://www.youtube.com/watch?v=Lgil1M9tAXU
if they were decent people, that would be one thing. but if they're going to be poisoned with the same faux-libertarian horseshit that strangled twitter, I don't see any value in supporting their protocol. there's always another protocol.
but assuming I was willing to play ball and support this protocol, they STILL haven't solved the actual problem that no one else is solving either: your data exists somewhere else. until there's a server that I can bring home and plug in with setup I can do using my TV's remote, you're not going to be able to move most people to "private" data storage. you're just going to change which massive organization is exploiting them.
I know, I know: hardware is a bitch and the type of device I'm even pitching seems like a costly boondoggle. but that's the business, and if you're not addressing it, you're not fomenting real change; you're patting yourself on the back for pretending we can algorithm ourselves out of late-stage capitalism.
[0] *potentially/eventually
That's not correct, actually hosting user data is cheap. Most users' repos are tiny. Bluesky doesn't save anything by having someone move to their own PDS.
What's expensive is stuff like video processing and large scale aggregation. Which has to be done regardless of where the user is hosting their data.
bsky saves the video processing and bandwidth by not hosting that content on bsky. it's a smaller problem, but in a large enough pool, images become heavy, too. and, either way, the egress of that content is expensive if you're doing it for the entire world, instead of letting each individual's computer (their pds) do it.
I'm happy to admit that text is cheap and bsky isn't looking to offload their data as it stands now. but let's be honest about the long term, which is what my original comment takes aim at.
Also, this:
>let them profit from a technical "innovation" that is 6 levels of indirection to mimic activity pub.
is also wrong because AT solves completely different problems. None of the stuff I wrote about in the post can be solved or or is being solved by ActivityPub. Like, AP is just message passing. It doesn't help with aggregation, identity, signing, linking cross-application data, any of that.
in any case, you're completely right about the activity pub comment. that was absolutely mockery and not actually a complaint. artistic license and all that. god forbid we express ourselves. but sure, I can recognize that AT proto is useful in that it provides mechanisms we didn't really have before. that said, it's not novel (just new) and it's not irreplaceable. LIKE I SAID: there's always another protocol.
any time you want to actually address the point of the comments, I'd be happy to get your take on why it's fine, actually, for the CEO to imply she doesn't have to care what users think as long as they aren't paying her. but if you're not ready to have the real conversation, I'll let you be satisfied with whatever other potshots you want to take at my reasonable indignation.
Quite some BSky users are publishing on their own PDS (Personal Data Server) right now. They have been for a while. There are already projects that automate moving or backign up your PDS data from BSky, like https://pdsmoover.com/
only reason I backed away from it is that when the bsky team had a big "fuck the users" moment, the user purporting to be the tangled founder was happy to cheer them on. so between having to use AT proto, and assuming that the tangled dev doesn't really disagree with bsky's "fuck the users" sentiment, I moved on. but, obviously, whiny moral grandstanding is irrelevant to whether or not someone made a good product. if you've got a use for it, I'd certainly recommend giving it a try!
New user sign up is a bit wonky. It asked for an email, login and password, then it's asking for a bsky sign-in too? This seems a little weird.
(Minor nit: for some reason, Google didn't auto-suggest a strong password for the password field.)
Then I got to the screen where it asks for full read-write access to my PDS and stopped there. It's kind of a lot to ask! I believe this is Bluesky's fault, but I don't think I can really use third-party bluesky apps until they implement finer-grained permissions.
personally, I could probably get behind recommending something like umbrel[0], if it included something like a "include a pds" option during config. but even that is asking for a lot of mind-share for a non-tech user. it would take a super smooth setup process for that to be realistic. point is, though, I'm not saying it can't be done; I'm saying no one is doing it and what people are doing is not getting the job done for wider adoption.
[0] https://umbrel.com/ *and, naturally, at this point, I'd prefer they include something that isn't based on AT proto for social publication. I wouldn't mind if they had both, but just an AT proto implementation wouldn't attract me.
https://www.joelonsoftware.com/2001/04/21/dont-let-architect...
https://www.joelonsoftware.com/2022/12/19/progress-on-the-bl...
This one in particular reads similar to what this comment section is about, it looks like Joel is basically becoming an architecture astronaut himself? Not sure if that's actually an accurate understanding of what his "block protocol" is, but I'm curious to hear from you what you think of that? In the 25 years since that post, has he basically become the thing he once criticized, and is that the result of just becoming a more and more senior/thinker within the industry?
https://news.ycombinator.com/newsguidelines.html
I agree something like "hyperlinked JSON" maybe sounds too abstract, but so does "hyperlinked HTML". But I doubt you see web as being vague? This is basically web for data.
Sure.
After taking the time to re-read the article since I initial posted my (admittedly shallow) dismissal, I realized this article is really a primer/explainer for the AT protocol, which I don't really have enough background in to criticize.
My criticism is more about the usefulness of saying "what if we treated social networking as a filesystem": which is that this doesn't actually solve any problems or add any value. The idea of modeling a useful thing (social media)[0] as a filesystem is generalizing the not-useful parts of it (ie. the minutia of how you actually read/write to it) and not actually addressing any of the interesting or difficult parts of it (how you come up with relevant things to look at, whether a "feed" should be a list of people you follow or suggestions from an algorithm, how you deal with bad actors, sock puppets, the list goes on forever.)
This is relevant to Joel's blog because of the point he makes about Napster: It was never about the "peer to peer" or "sharing", that was the least interesting part. The useful thing about Napster was that you could type in a song and download it. It would have been popular if it wasn't peer to peer, so long as you could still get any music you wanted for free.
Modeling social media as a filesystem, or constructing a data model about how to link things together, and hypergeneralizing all the way to "here's how to model any graph of data on the filesystem!" is basically a "huh, that's neat" little tech demo but doesn't actually solve anything. Yes, you can take any graph-like structured data and treat it as files and folders. I can write a FUSE filesystem to browse HN. I can spend the 20 minutes noodling on how the schema should work, what a "symlink" should represent, etc... but at the end of the day, you've just taken data and changed how it's presented.
There's no reason for the filesystem to be the "blessed" metaphor here. Why not a SQL database? You can `SELECT * FROM posts WHERE like_count > 100`, how neat! Or how about a git repo? You can represent posts as commits, and each person's timeline as a branch, and ooh then you could cherry-pick to retweet!
These kind of exercises basically just turn into nerd-sniping: You think of a clever "what if we treated X as Y" abstraction, then before you really stop to think "what problem does that actually solve", you get sucked into thinking about various implementation details and how it to model things.
The AT protocol may be well-designed, it may not be, but my point is more that it's not protocols that we're lacking. It's a lack of trust, lack of protection from bad actors, financial incentives that actively harm the experience for users, and the negative effects on what social media does to people. Nobody's really solved any of this: Not ActivityPub, not Mastadon, not BlueSky, not anyone. Creating a protocol that generalizes all of social media so that you can now treat it all homogeneously is "neat", but it doesn't solve anything that you couldn't solve via a simple (for example) web browser extension that aggregated the data in the same way for you. Or bespoke data transformations between social media sites to allow for federation/replication. You can just write some code to read from site A and represent it in site B (assuming sites A and B are willing.) Creating a protocol for this? Meh, it's not a terrible idea but it's also not interesting.
- [0] You could argue whether social media is "useful", let's just stipulate that it is.
1. File-first paradigm has some valuable properties. One property is apps can't lock data out of each other. So the user can always change which apps they use.
2. Web social app paradigm doesn't have these properties. And we observe the corresponding problems: we're collectively stuck with specific apps. This is because our data lives inside those apps rather than saved somewhere under our control.
3. The question: Is there a way to add properties of the file-first paradigms (data lives outside apps) to web social apps? And if it is indeed possible, does this actually solve the problems we currently have?
The rest of the article explores this (with AT protocol being a candidate solution that attempts to square exactly this problem). I'm claiming that:
1. Yes, it is possible to add file-first paradigm properties to web social apps
2. That is what AT protocol does (by externalizing data and adding mechanisms for aggregation from user-controlled source of truth)
3. Yes, this does solve the original stated problems — we can see in demos from the last section that data doesn't get trapped in apps, and that developers can interoperate with zero coordination. And that it's already happening, it's not some theoretical thing.
I don't understand your proposed alternative with web extension but I suspect you're thinking about solving some other problems than I'm describing.
Overall I agree that I sacrificed some "but why" in this article to focus on "here's how". For a more "but why" article about the same thing, you might be curious to look at https://overreacted.io/open-social/.
Again, you missed my point. Data sharing is the least interesting thing imaginable, has already been solved countless times, and is not the reason social media sinks or swims.
Social media sinks or swims based on one thing and one thing only: is it enjoyable to use. Are all the people on here assholes or do they have something interesting to say? Can I post something without being overrun by trolls? How good are the moderation standards? How do I know if the people posting aren’t just AI bots? What are the community standards? In short: what kind of interactions can I expect to have on the platform?
The astronaut types look at the abysmal landscape social media has become, and think “you know what the fundamental problem is? That all this is locked up in apps! Let’s make a protocol, that’ll fix it!”
Never mind that the profit seeking platforms have zero interest in opening up their API to competing sites. Never mind that any of the sites that are interested in openness/federating all univerally have no answer to the problem of how you address content moderation, or at least nothing that’s any different from what we’ve seen before.
The problem in social media is not that things are locked up behind an app. There are apps/readers that combine multiple platforms for me (I remember apps that consolidated Facebook and twitter fully eighteen years ago. It’s not hard.)
The problem with social media is that it’s a wasteland full of bots and assholes.
A HN poster said it best 8 years ago about twitter, and I think it applies to all of social media: it’s a planetary scale hate machine: https://news.ycombinator.com/item?id=16501147
To give you an example, Blacksky is in setting up their alternative server that is effectively forking the product, which gives them ability to make different moderation decisions (they've restored the account of a user that is banned from Bluesky: https://bsky.app/profile/rude1.blacksky.team/post/3mcozwdhjo...).
However, unlike Mastodon and such, anyone on the Blacksky server will continue living in the same "world" as the Bluesky users, it's effectively just a different "filter" on the global data.
Before AT, it was not possible to do that.
You couldn't "fork" moderation of an existing product. If you wanted different rules, you had to create an entire social network from scratch. All past data used to stay within the original product.
AT enables anyone motivated to spin up a whole new product that works with existing data, and to make different decisions on the product level about all of the things you mentioned people care about. How algorithms run, how moderation runs, what the standards are, what the platform affordances are.
What AT creates here is competition because normally you can't compete until you convince everyone to move. Whereas with AT, everybody is always "already there" so you can create or pick the best-run prism over the global data.
Does this make more sense? It's all in service of the things you're talking about. We just need to make it possible to try different things without always starting from scratch.
Why can’t we have nice things?
I guess that’s what Internet Archive is for.