I solved a distributed queue problem after 15 years

(dbos.dev)

32 points | by jedberg 3 days ago

5 comments

yencabulator 0 minutes ago
[delayed]
na4ma4 8 hours ago
The article is interesting, but what does the title have to do with it ?
> What we really needed to make distributed task queueing robust are durable queues ...
> Durable queues were rare when I was at Reddit, but they’re more and more popular now.
It sounds like the answer was known at the time but there wasn't the resources to solve it ?
[-]
- jedberg 2 hours ago
  Back then, you couldn't build a performant queue in a database without a huge number of resources, both hardware wise and people wise. That's why only huge enterprises were using them then.
  Only recently (last decade or so), has the performance of an open source database on modest hardware caught up to the alternatives.
noident 13 hours ago
Hi, OP. Any thoughts on why to prefer using your OLTP for queueing as opposed to Kafka? In my mind it would be a drop in ease of observability (though there is still KSQL and a number of UI wrappers) in exchange for much better performance. I'm interested if you had other reasons.
[-]
- KraftyOne 17 minutes ago
  Observability is a big advantage, another advantage (in the context of DBOS specifically) is integration with durable workflows, so you can write a large workflow that enqueues and manages many smaller tasks.
redditor98654 7 hours ago
From what I recall, Reddit uses AWS extensively. Could they not have replaced RabbitMQ with SQS? You get the near unlimited horizontal scalability, extremely good uptime, guaranteed at least once message delivery and for the case of a worker crash, the messages will become visible again after the visibility timeout (since they wouldn’t have been deleted by the worker).
[-]
- jedberg 2 hours ago
  SQS could not handle the volume or latency requirements we had, and it was too expensive compared to running it ourselves at the time.
- 4ndrewl 4 hours ago
  I think there is a hard limit on the number of in-flight requests (that is items that have been dequeued by a worker, but whose job has not been completed). I wouldn't be surprised if Reddit hit those sorts of volumes.
belZaah 11 hours ago
I was architecture team lead at Skype 2005-2011 and persistent queues were the only ones around. Basically, because they knew how to scale databases, our DBA team (Hannu Krosing comes to mind) built queues into and on top of Postgres. It happened not just at Skype, too: eBay guys (Dan Prichett in particular) had built a very cool Oracle-based queue solution. Scaling persistent queues seems to be something that needs to be reinvented periodically. Maybe it’s too nuanced of a problem?
[-]
- danieldc 53 minutes ago
  FWIW, SQL Server had built in queuing support from ~2005. The feature is called Service Broker (https://learn.microsoft.com/en-us/sql/database-engine/servic...)
- BSVogler 3 hours ago
  This article resonated with my experience building what was essentially a distributed task queue using Redis+PostgreSQL with Python workers in Kubernetes. It seems like these systems naturally evolve different patterns based on their specific use cases. The logic of our queue was intertwined with a rule engine. I wrote about building a rule engine here [1]. Another difference to this article is that it did not report back to the client as the events were delivered via web hooks.
  There are some different approaches here and there which come from making it application specific, e.g. we added a periodic reconciliation check. I also built a debouncer into the queue to give special treatment to burst in the load.
  [1] https://blog.benediktsvogler.com/blog/building-a-distributed...
- jedberg 10 hours ago
  I actually worked on security for Skype in 2005/2006. And I knew Dan at eBay too.
  Using databases for queues isn't a new idea, but a couple things make it different now. One is that I think this is the first time a solution has been open-sourced (not 100% sure on that).
  And two, Postgres added SKIP LOCKED in 9.5 (around 2016), which was the performance unlock to make this work without needing a whole DBA team like Skype had.
  [-]
  - arthurcolle 4 hours ago
    is this referring to Dan Tarman?
- asplake 10 hours ago
  I know from firsthand experience that there were investment banks doing this at least as early as the mid 1990s - on top of Sybase in the particular case I have in mind. Perhaps not the best performance, but it was trivially easy to inspect, and you could integrate with other database features, for example stored procedures.