Mongo but on Postgres and with strong consistency benefits

307 points by oskar_dudycz a year ago

Thanks, just added Pongo to the NoSQL section of my "Postgres Is Enough" gist:

https://gist.github.com/cpursley/c8fb81fe8a7e5df038158bdfe0f...

czechdeveloper a year ago

I suggest you add https://github.com/tensorchord/pgvecto.rs for vector search as well
- cpursley a year ago
  
  This is really nice. Your project?
pmarreck a year ago

“Scalling” should say “Scaling”
Nice list!
oskar_dudycz a year ago

Thank you!

I tried a similar approach in a previous startup - treat data as documents and store in a JSONB field.

Postgres was awesome and handled this brilliantly, but the lack of schema and typing killed it. We just ended up fighting data quality the whole time. We couldn't assume that any document had all the required fields, or that they were in a format that made sense e.g. the Price column sometimes had currency symbols, and sometimes commas-and-periods in UK/US format and sometimes in Euro format - sorting by Price involved some complicated parsing of all the records first.

We moved back to relational tables.

I won't say I'd never do this again, but I would definitely not just throw JSON documents to a database and expect good things to happen.

aden1ne a year ago

Had a similar experience. What killed it for me, is that no statistics can be gathered for JSONB columns. This in turn really messes with the query planner once you do something like `select a.* from a join b on a.id = b.a_id where b.my_jsonb_column ->> 'foo' = 'bar';`.
Given the lack of statistics, the query planner loves going for a nested loop rather than hash or merge join where those would appropriate, leading to abysmal performance.
There is an thread[0] on the PostgreSQL mailing list to add at least some statistics on JSONB column, but this has gone nowhere since 2022.
[0]: https://www.postgresql.org/message-id/flat/c9c4bd20-996c-100...
- jeffffff a year ago
  
  it's kind of dumb that postgres uses a nested loop join instead of a hash join there. hash join almost always has the best worst-case behavior, and without stats it should be the default choice.
winternewt a year ago

If you have schema requirements, why _not_ do it with tables and columns instead? The point of the jsonb column would be precisely to store arbitrary stuff that has no schema.
- bruce343434 a year ago
  
  I usually see this (dynamic scripting langs, schemaless databases) play out as a very good environment to prototype in, very few barriers to change things or make stuff up as you go. Then the prototype is not discarded but "iterated" to form the actual production code. The (now needlessly so) flexible JSON store is grand-fathered in.
  I have yet to come across an actual use case for JSON stores in a production app with an established design. What the hell do you mean you have no idea what the data might hold?? Why are you storing unknown, undefined, undefinable things??? Or perhaps, there actually is a schema i.e. fields we rely on being present, but we were too lazy to define it formally?
  
  eropple a year ago
  
  Well, there are a decent number of those use cases, but the one I find most compelling is storing an exact record of what another service sent me in case I need it later. I pull out and store in my typed schema the relevant bits of, say, an OpenID Connect response, but I'll often store the entire response as well in case I want to add functionality in the future based on additional fields, etc. in that response and roll forward without having to have an intermediate state.
  
  tracker1 a year ago
  
  Adjacent data that is very alike for most usage but different in subtle ways.
  Classified ads, the additional details for a Car are different than a Shirt, but both would be ads. And adding a nearly infinite number of fields or a flexible system in a set of schema or detail tables is significantly worse than unstructured JSON.
  Another would be records from different, but related systems. Such as the transaction details for an order payment. Paypal data will be different from your CC processor, or debit transaction, but you can just store it as additional details for a given "payment" record.
  Another still would be in healthcare, or any number of other systems where the structures will be different from one system to another depending on data interchange, where in the storage you don't care too much, only in the application layer will it make any necessary difference.
  
  mexicocitinluez a year ago
  
  > I have yet to come across an actual use case for JSON stores in a production app with an established design
  Healthcare. Clinical data is unstructured and constantly changing. Would you build out a table with 2000 columns that changed yearly? What about 5000?
  
  goosejuice a year ago
  
  Data owned by another service/application/org that is written and read without modification.
  For example, a healthcare document that wasn't built or technically owned by the application storing it.
  For example, a web text editor that serializes it's state as json.
  Not json, but a web scraper storing html documents.
  These have structure, it's only that the structure is built/maintained outside of the application storing it. You could of course transform it but I think it's a bit obvious where that might not be worth the cost/complexity.
  
  cryptonector a year ago
  
  You can always change the source of truth so instead of being the JSON texts it's the properly normalized schema and then you can have a VIEW that returns JSON for apps that need JSON. And if need be you can denormalize a bit and have a JSONB column that stores the document as the serialization of the rows' other columns.
  That's what's really nice about SQL: you can perform these sorts of schema surgeries and still retain backwards-compatibility using VIEWs.
  
  cruffle_duffle a year ago
  
  A use case for me is I’m calling a lot of different 3rd party APIs that return similar things but in widely different data structures with a lot of information specific to that site. I take what is standard across all the API’s and store that in my relational database but anything I didn’t use I store in a JSONB field just in case I need it some time in the future.
  In other words I’m not really using anything in that JSONB field… at least right now.
  
  SSLy a year ago
  
  We're evaluating solutions to ingest business app events and infrastructure logs. Document-ish JSON-ish "DB"/search engine seems like best solution, because no one does structured logs properly.
  Now we have to pick the stack. ELK, Loki + Graphana, Graylog or maybe just dump into MongoDB?
  
  okr a year ago
  
  Customers sent custom things.
- marcus_holmes a year ago
  
  I think my learning was "all data has schema, eventually"
  We thought it was just a bunch of data documents. But it turned out that to actually use that data in an application it had to have predictability, we had to know certain fields were present and in a fixed format. You know: a schema.
  
  mexicocitinluez a year ago
  
  I'm confused about this. Since when can't you enforce a schema on a document you're storing? Did you have no way to change the data before it hit the database?
  NoSql stores still have schemas.
jimmyl02 a year ago

the view I now have is that for a relational table, yes you have to suffer through migrations but at least they are in sql. for document based stores, you still have to have migrations, but they are just implemented in code
json documents sound great, especially initially, but end up being a maintenance nightmare
- hans_castorp a year ago
  
  > for document based stores, you still have to have migrations, but they are just implemented in code
  The problem with that - in my experience - is that migrating the structure for thousands (if not millions) of documents is way slower than running a DDL command (as it means reading each document, parsing it, modifying it and writing it back). Many DDL commands are just metadata update to the system catalogs so they are quite fast (e.g. adding a new column with a default value). With documents you wind up with millions of single row updates.
  This can be mitigated by doing a "lazy" migration when a document with the old structure is first read. But that makes the code much more complicated.
- globular-toast a year ago
  
  Or, to put it another way, yes you have to write and maintain an upfront schema, but a document-based system has a schema too, it's just distributed amongst 50 codebase and 10 people's heads.
raverbashing a year ago

> the Price column sometimes had currency symbols, and sometimes commas-and-periods in UK/US format and sometimes in Euro format
Well this is not a relational issue is it? It is a data normalization issue
- marcus_holmes a year ago
  
  Yes, true. But that was a single example off the top of my head. There were lots of others.
  Not saying the approach isn't suitable for some use cases. Just that I'd be really careful that this is one of those use cases next time.
jamil7 a year ago

I don’t do much backend work at the moment but I found using JSONB fields for prototyping endpoints to be quite handy. Once I had a feature figured out and working, I’d create a proper schema.
- marcus_holmes a year ago
  
  I've found Postgres' Hstore type useful for this, too.
  I often create a "metadata" hstore field on a table and use it for random bits of data I don't want to create an actual field for yet. When I find that the application needs that bit of data and in a certain format, I'll move it into an actual field.
  
  cryptonector a year ago
  
  hstore is basically the flat precursor to JSONB, and IMO it's as good as obsolete. I wouldn't use hstore in a new project, just JSONB.
  
  marcus_holmes a year ago
  
  I get that, but I think the temptation to add too much structure to the JSONB would be too great; this is just a temporary or prototype field, so it just needs a name and a value. Hstore works great for this.
  
  cryptonector a year ago
  
  And I get that, but also hstore is ugly. Just be disciplined :)
tracker1 a year ago

This is an issue regardless of your type of Document storage. There are different options for verifying document details and versioning when approaching Document oriented databases. A typical usage is to have a per-record version number and pass documents through a verifier either as part of insert, update or query to ensure all migrations to the current version have occurred.
goosejuice a year ago

Sounds like most/all of these issues would be solved by validation at the app layer.
- ivandenysov a year ago
  
  True, but for example with db-level validations I don’t have to worry that a new developer ships a batch job that runs raw inserts for performance reasons and bypasses app layer validations.
  
  goosejuice a year ago
  
  I'm not advocating for no database level validations. One can and should have both in my opinion.
  Postgres supports constraints on jsonb as well.
  I would generally advocate for no write access outside of the app as well. Certainly for new developers. Get some tests on that batch job.
  
  corytheboyd a year ago
  
  > I would generally advocate for no write access outside of the app as well.
  FWIW I think OP was referring to app code still, but code opting in to skipping app data validations. Rails for example makes this very easy to do, and there are tons and tons of people out there selling this as a performance improvement (this one trick speeds up bulk operations!!!!). There are times where it’s useful, but it’s a very sharp knife that people can easily misuse.
  So yeah, anyway, have both db constraints and app validations.
  
  goosejuice a year ago
  
  Thanks for pointing that out, I did misread that and have seen/written such things in rails land. We used active record import which gives the option of running validations.
  But yeah, layering your safety nets is generally wise. I would include testing and code review in that as well.
  
  zepolen a year ago
  
  What? You should be exposing your data access layer as a microservice and all your db level validations should be done there...
  You can even write a small DSL syntax language to make it easier to use for developers, and perhaps an Obviously Reduntant Middleware that sits between them to convert their programming language objects to the DSL. Add some batch support, perhaps transactional locks (using mongo, we want to be webscale after all) and perhaps a small terminal based client and voila, no one should ever need to deal with petty integrity in the db again.
  
  eropple a year ago
  
  This post is a fantastic example of Poe's Law. You had me for a second.
- pennomi a year ago
  
  App layer validation is for the user’s sanity. DB layer validation is for the developer’s sanity.
  
  evanelias a year ago
  
  I'd assume this might be solvable in the DB layer using CHECK constraints, possibly combined with JSON Schema validation?
  From quick glance, JSON Schema validation isn't built-in to Postgres, but there are third-party extensions, such as https://github.com/supabase/pg_jsonschema or https://github.com/gavinwahl/postgres-json-schema.
  I'm more familiar with MySQL and MariaDB, which offer it built-in: https://dev.mysql.com/doc/refman/8.0/en/json-validation-func... and https://mariadb.com/kb/en/json_schema_valid/
  
  goosejuice a year ago
  
  I like this as a general rule, but I almost always have more refined validation at the app layer -- particularly when using jsonb.
  Tool of choice is ecto which has excellent support for maintaining jsonb structure.
  
  papichulo2023 a year ago
  
  Ehhh, most dbs are managed by migrations in the application, so essentially is duplicated logic
  
  eropple a year ago
  
  Most databases eventually have a human typing an INSERT statement or the moral equivalent. It's not duplication to have your correctness functions protect that, too.
cruffle_duffle a year ago

You know I’ve found that LLM’s are awfully good at taking random JSON crap and “converting it” to a consistent format. You don’t even need a large LLM… something like ChatGPT 3.5 or an equivalent class of LLM can do the work just fine.
It’s not perfect and requires a fair amount of effort to nail the prompt but when it works it works.

zulban a year ago

Neat. When I migrated a project from mongo to postgres I took a similar approach, except I only implemented the mongo feel I needed within my own project instead of building a proper library as done here. I was surprised how much performance improved despite using a hacky wrapper.

https://blog.stuartspence.ca/2023-05-goodbye-mongo.html

Personally tho, I plan to just drop all similarity to mongo in future projects.

throwaway2037 a year ago
Cripes, this is a good blog post. Did it get posted to HN for discussion? If not, it should have been. I especially like this part:
```
    > I'm trying to stay humble. Mongo must be an incredibly big project with lots of nuance. I'm just a solo developer and absolutely not a database engineer. I've also never had the opportunity to work closely with a good database engineer. However I shouldn't be seeing improvements like this with default out of the box PostgreSQL compared to all the things I tried over the years to fix and tune Mongo.
```
Humbleness: That is rare to see around here. It is impressive that you go such speed-ups for your use case. Congrats and thank you to share with the blog post.
EDIT
The Morgan Freeman meme at the end gave me a real laugh. I would say the same about my experience with GridGain ("GridPain").
- dayjaby a year ago
  
  > I knew it was time to migrate when my non-programmer wife was becoming familiar with Mongo shortcomings
  That part was so relatable
oskar_dudycz a year ago

Yup, I might not reach full compliance, but I will try to follow the Pareto principle. Thanks for the link and kind feedback!
jrochkind1 a year ago

> I was surprised how much performance improved
Are you saying you got better performance from postgres jsonb than from mongodb itself?
- zo1 a year ago
  
  From the article, they mention this alongside a neat graph: "API calls generally take 8 ms now, not 150 ms."
  His endpoints went from 150ms (with Mongo) to 8ms after moving to Postgres.
  
  jrochkind1 a year ago
  
  Thanks. How embarressing for mongo, woah.
Thaxll a year ago

I remember your post back then and it did not made sense at all, many pointed out it was lacking information and you probably did something wrong with mongo.
All the stuff under Mongo Problems is garbage, sorry.

harel a year ago

I regularly find the hybrid model is a sweet spot. I keep core fields as regular columns and dynamic data structures as JSONB. It brings the best of both worlds together.

Waterluvian a year ago

I do this too with Postgres and it is just the best of both.
A robot is a record. A sensor calibration is a record. A warehouse robot map with tens of thousands of geojson objects is a single record.
If I made every map entity its own record, my database would be 10_000x more records and I’d get no value out of it. We’re not doing spatial relational queries.
- hobs a year ago
  
  It's great when you have no reason EVER to decompose the data.
  That being said, when you start going "wait why is one record like this? oh no we have a bug and have to fix one of the records that looks like this across all data" and now you get to update 10,000x the data to make one change.
  
  harel a year ago
  
  Small price to pay in my opinion. How often will that happen vs how often the database is used. Migrations like that can be done incrementally over time. It's a solved problem.
  
  Waterluvian a year ago
  
  It’s also trivial to do. My JSON fields are all backed by JSON Schema. And I just write a data migration that mutates the data in some way and have the migration run by one host in a rate limited manner. It’s not quite as good as a traditional change in schema but it’s such a non-issue.
  
  hobs a year ago
  
  I am glad it works! I have just been subject to several systems that have grown over time that worked very well until it became a problem (and then a huge one) so I am glad you are taking a disciplined approach.
  
  Waterluvian a year ago
  
  Yup you’re absolutely right. There is no one size fits all. The more you can plan, and the more clear your case is, the less you need to pay for having flexibility.
webprofusion a year ago

Yeah we do this as well. Have previously been a heavy MongoDB user but when we migrated to Postgres (using some relational schema with some JSONB for nested objects etc) it just made querying so much easier and reliability has been great.

willsmith72 a year ago

It's technologically cool, but I would love a "why" section in the README. Is the idea you're a mongo Dev/love the mongo api and want to use it rather than switch to pg apis? Or want to copy some code over from an old project?

I'm sure there are use cases, I'm just struggling to grasp them. Especially if it's about reusing queries from other projects, AI is pretty good at that

megadal a year ago

> AI is pretty good at that
Good at what? Rewriting the queries?
I think the point of Pongo is you can use the exact same queries for the most part and just change backends.
I've worked a job in the past where this would have been useful (they chose Mongo and regretted it).
- willsmith72 a year ago
  
  For sure, but it feels really risky. If it's a small codebase, I would be more confident knowing what queries I was using and just switch them. If it's a large codebase, I'd want some really comprehensive test coverage, including performance tests
oskar_dudycz a year ago

Good call! I added such a section: https://github.com/event-driven-io/Pongo?tab=readme-ov-file#.... I'd be interested in your thoughts on it.
- willsmith72 a year ago
  
  nice!

ilius2 a year ago

If I were to start a new project, I would directly use postgres, and possibly add a JSONB column ONLY FOR OPTIONAL fields that you don't query frequently. Throwing everything in a document is just fermenting chaos and pain. That being said, I do love the syntax and structure of Mongo pipelines over SQL.

throwaway76324 a year ago

At $WORK, we use the same approach for integrations with 3rd party systems.
The data that is common for all integrations are stored as columns in a relational table. Data that are specific for each integration are stored in JSONB. This is typically meta data used to manage each integration that varies.
It works great and you get the combination of relational safety and no-schema flexibility where it matters.
stanislavb a year ago

"Throwing everything in a document is just fermenting chaos and pain." - I LOVE THIS.

Squarex a year ago

How does it compare with FerretDB[0]?

[0] https://www.ferretdb.com/

aleksi a year ago

(I'm FerretDB co-founder)
As far as I can tell, Pongo provides an API similar to the MongoDB driver for Node that uses PostgreSQL under the hood. FerretDB operates on a different layer – it implements MongoDB network protocol, allowing it to work with any drivers and applications that use MongoDB without modifications.
- Sytten a year ago
  
  I dont want to sound rude, but as a bootstrap founder it kinda boggles my mind how much money people can raise for a product like ferretdb. I just don't see how it can make VC level return without at the very least changing licenses which seems to ne the premise behind creating this MongoDB proxy. I am sure there is a narrative for it though so best of luck!
  Also check you managed service links on GitHub, half are dead.
  
  aleksi a year ago
  
  If bait-and-switch were our strategy, we would have chosen a different license from the beginning. The Apache license allows everyone to fork FerretDB away and do whatever they like with it. It is unlike MongoDB with their initial AGPL that, in theory, allows everyone to, say, run MongoDB SaaS, but in practice, has enough strings attached to scare people off.
  We want to have a piece of a bigger pie, not a bigger piece of an existing pie. Providing alternatives makes the whole market bigger.
  > Also check you managed service links on GitHub, half are dead.
  Thank you.
- Keyframe a year ago
  
  Even monstache?
  
  aleksi a year ago
  
  That was the first time I heard about that project. Someone could check it using our guide: https://docs.ferretdb.io/migration/premigration-testing/ Or we will check it ourselves later: https://github.com/FerretDB/FerretDB/issues/4429
Zambyte a year ago

The posted project looks like a client that connects to pg but behaves like Mongo, where Ferret is a server that accepts Mongo client connections and uses pg as backend storage.
oskar_dudycz a year ago

Yes, I'm using MongoDB API in Pongo to keep the muscle memory. So, it's a library that translates the MongoDB syntax to native PostgreSQL JSONB queries.
- remram a year ago
  
  "how does it compare?" "yes"
  Are you sure you replied to the right comment?
  
  oskar_dudycz a year ago
  
  Yes to "The posted project looks like a client that connects to pg but behaves like Mongo, where Ferret is a server that accepts Mongo client connections and uses pg as backend storage."
  
  remram a year ago
  
  Yeah, FYI that's a sibling comment, not the one you replied to.

hdhshdhshdjd a year ago

I use JSONB columns a lot, it has its place. It can fit certain applications, but it does introduce a lot of extra query complexity and you lose out on some ways to speed up query performance that you could get from a relational approach.

Which is to say JSONB is useful, but I wouldn’t throw the relational baby out with the bath water.

oskar_dudycz a year ago

I'm planning to add possibility to use Generated Columns in the future https://www.postgresql.org/docs/current/ddl-generated-column... to allow more optimisations.
- cypherpunks01 a year ago
  
  Can a Generated Column read data from a JSONB column? That'd be really cool, but I'm not familiar enough with generated columns to know.
  
  jamescrowley a year ago
  
  Yep!
noisy_boy a year ago

I think if a field in a JSONB column is used frequently in the most common query patterns, it might be time to "surface" it up among other non-JSONB relational columns. There will be some additional overhead e.g. it should be made a read-only column for consistency's sake; if you update the value of this field in the JSONB column, this surfaced column should also be updated.
However, a bit of duplication is not a terrible trade-off for significantly improved query performance.
doctor_eval a year ago

I’ve been doing some reasonably serious playing with the idea of using jsonb columns as a kind of front end to relational tables. So basically, external interactions with the database are done using JSON, which gives end users some flexibility, but internally we effectively create a realtime materialised view of just those properties we need from the json.
Anyone else tried this approach? Anything I should know about it?
- Zenzero a year ago
  
  My mental model has always been to only use JSONB for column types where the relations within the json object are of no importance to the DB. An example might be text editor markup. I imagine if you start wanting to query within the json object you should consider a more relational model.
  
  doctor_eval a year ago
  
  My experience has been that users like to include a lot of information that's not relevant to the application I'm working on.
  My application couldn't really care less about customer names, for example. But the people who buy my software naturally do care - and what's worse, each of my potential customers has some legacy system which stores their customer names in a different way. So one problem I want to address is, how do I maintain fidelity with the old system, for example during the data migration, while enabling me to move forward quickly?
  My solution has been to keep non-functional data such as customer names in JSON, and extract only the two or three fields that are relevant to my application, and put them into a regular SQL database table.
  So far this has given me the best of both worlds: a simple and highly customisable JSON API for these user-facing objects with mutable shapes, but a compact SQL backend for the actual work.
  
  mbesto a year ago
  
  I always thought JSONB is a great use case for client-specific customizations. For example - a CRM application where you have structured DB tables like "account", "opportunity", etc. but then a client-specific table might be "cohort" that is linked to "account" and has fields "cohort size", "cohort name", etc. You just chuck the cohort table in JSONB and now the company who uses the CRM that has the custom cohort table can change that table without creating/migrating your RDMS table structure.
  Curious what other people think here?
  
  Zenzero a year ago
  
  That sounds like something that would be better tailored to well normalized tables and custom row entries. The question I'd ask is if something you're concerned about being a column could instead be a row in a different table.
- hdhshdhshdjd a year ago
  
  I do something similar, building a lightweight search index over very large relational datasets.
  So the tables are much simpler to manage, much more portable, so I can serve search off scalable hardware without disturbing the underlying source of truth.
  The downside is queries are more complex and slower.
martinald a year ago

I've had extremely disappointing performance with postgres JSONB columns. I've used it a lot for MVPs and betas where it's much easier to use JSONB and lock in your exact relational schema lately.
I've now decided this path is a mistake because the performance is so bad. Even with low thousands/tens of thousands of rows it becomes a huge problem, queries that would take <1ms on relational stuff quickly start taking hundreds of ms.
Optimizing these with hand rolled queries is painful (I do not like the syntax it uses for jsonb querying) and for some doesn't really fix anything much, and indexes often don't help.
It seems that jsonb is just many many order of magnitudes slower, but I could be doing something wrong. Take for example storing a dictionary of string and int (number?) in a jsonb column. Adding the ints up in jsonb takes thousands of times longer rather than having these as string and int in a standard table.
Perhaps I am doing something wrong; and I'd love to know it if I am!
- ak217 a year ago
  
  Did you have arrays in your jsonb data? I don't currently use jsonb for production performance sensitive queries, but in the past what I learned was that it's great at automatically indexing everything except arrays. You have to manually declare indexes on your arrays.
  When jsonb works, it's incredible. I've had many... suboptimal experiences with mongo, and jsonb is just superior in my experience (although like I said, I haven't used it for performance critical stuff in production). For a long time, it kinda flew under the radar, and still remains an underappreciated feature of Postgres.
  
  oskar_dudycz a year ago
  
  There are, but indeed they're a bit painful to index correctly. I'm planning to add in the future Generated Columns https://www.postgresql.org/docs/current/ddl-generated-column... to give option to extract part of the json as regular column.
- phamilton a year ago
  
  > indexes often don't help.
  An "index on expression" should perform the same regardless of the input column types. All that matters is the output of the expression. Were you just indexing the whole jsonb column or were you indexing a specific expression?
  For example, an index on `foo(user_id)` vs `foo(data->'user_id')` should perform the same.

frithsun a year ago

Programmers would be better served by learning nothing except SQL instead of their current strategy of trying to learn everything except SQL.

dboreham a year ago

They should learn about b-trees and how indexed queries can be done with them either with or without an explicit query language. Then they can decide what kind of data storage service they need. Understand what's happening inside the black box.
- cryptonector a year ago
  
  Yes, for sure, though I'd still start with SQL.
dudeinjapan a year ago

While I’d agree that understanding SQL basics is an important fundamental for novices to learn, I started using MongoDB 11 years ago and haven’t looked back.

navbryce a year ago

I made a joke tweet about this in Nov 2023 (even called it "Pongo"). This is definitely a just a funny coincidence, but I'm going to pretend like I can see into the future: https://x.com/navbryce/status/1720580136737894661

oskar_dudycz a year ago

"Great mind think alike" B-)

karmakaze a year ago

What makes mongo mongo is its distibruted nature, without it you could just store json(b) in an RDBMS.

rework a year ago

> What makes mongo mongo is its distibruted nature
Since when? Mongo was popular because it gave the false perception it was insanely fast until people found out it was only fast if you didn't care about your data, and the moment you ensure write happened it ended up being slower than an RDB....
- jokethrowaway a year ago
  
  Since forever, sharding, distributing postgres / mysql was not easy. There were a few proprietary extensions. Nowadays it's more accessible.
  This was typical crap you had to say to pass fang style interview "oh of course I'd use mongo because this use case doesn't have relations and because it's easy to scale", while you know postgres will give you way less problems and allow you to make charts and analytics in 30m when finance comes around.
  I made the mistake of picking mongo for my own startup, because of propaganda coming from interviewing materials and I regretted it for the entire duration of the company.
  
  threeseed a year ago
  
  > Nowadays it's more accessible
  Distributing PostgreSQL still requires proprietary extensions.
  With the most popular being Citus which is owned by Microsoft and so questions should definitely remain about how long they support that instead of pushing users to Azure.
  People like to bash MongoDB but at least they have a built-in, supported and usable HA/Clustering solution. It's ridiculous to not have this in 2024.
  
  foobarkey a year ago
  
  Trying to use MongoDB by default for new projects because of the built in HA, dumped Postgres because the HA story is so bad on bare metal (alright if you are ok burning money on RDS or simiar).
  Current preference: 1. HA MongoDB 2. HA MariaDB (Galera) or MySQL Cluster 3. Postgres Rube Goldberg Machine HA with Patroni 4. No HA Postgres
richwater a year ago

> store json(b) in an RDBMS
I actually did this for as small HR application and it worked incredible well.jsonb gin indexes are pretty nice once you get the hang of the syntax.
And then, you also have all the features of Postgres as a freebie.
- eddd-ddde a year ago
  
  Personally, I much better like postgres json syntax than whatever mongo invented.
  Big fan of jsonb columns.
  
  oskar_dudycz a year ago
  
  I'm planning to add methods for raw JSON path or, in general, raw SQL syntax to enable such fine-tuning and not need to always use MongoDB API. I agree that for many people, this would be better.
anonzzzies a year ago

So how easy is it to distribute it? I don’t have experience with it but the tutorials look terrible compared to, say, Scylla, Yuga, Cockroach, TiDB etc. Again, honest question?
- tracker1 a year ago
  
  Relatively easy... though, and I may be out of date, you have to choose replication or sharding at any given level... so if you want horizontal scale plus redundancy, you will wind up with slightly more complexity. My experience trying to recover a broken cluster after 5 hours of Azure going up/down in series was anything but fun.
  Would have to test, but the library for this post may well work with CockroachDB if you wanted to go that route instead of straight PostgreSQL. I think conceptually the sharding + replication of other DBs like Scylla/Cassandra and CockroachDB is a bit more elegant and easier to reason with. Just my own take though.
  
  karmakaze a year ago
  
  I don't see any point in emulating MongoDB (NoSQL) document store on CockroachDB (NewSQL) which is already sharded and replicated.
  
  redwood a year ago
  
  Right if you want to use a sharded and replicated system with a document data model, might as well use the real thing rather than introduce the risk and impedance mismatch of an emulation layer. The whole point here is for folks to not have to have the cognitive and performance downsides of a tabular data model
- rad_gruchalski a year ago
  
  Pongo seems to be a middleware between your app and Postgres. So it will most certainly work absolutely fine on YugabyteDB, if one’s okay with occasional latency issues.
  One could optimise it more for a distributed sql by implementing key partition awareness and connecting directly to a tserver storing the data one’s after.
  
  oskar_dudycz a year ago
  
  Yes, as long as database has support to JSONB and JSON path syntax (so PG 12 >= compliant) you should be good to go :)
  
  rad_gruchalski a year ago
  
  It could work: https://docs.yugabyte.com/preview/explore/ysql-language-feat....
- theteapot a year ago
  
  Does "distributed" mean sharded or just replicated? In either case it's a bit quirky but easy enough.
  > Scylla, Yuga, Cockroach, TiDB etc.
  You have experience "distributing" all these DBs? That's impressive.
  
  anonzzzies a year ago
  
  We evaluated these over the years for different purposes/clients. And I don’t expect the ‘quirky’ part in 2024 I guess; it gets hard of course when the loads get specific; but in general I expect things to be automatic (multi master, sharding and replication) and simple to set up. Point out other nodes and done.
- Too a year ago
  
  It couldn’t be simpler. Just configure all clients and servers to be aware of each other and go.
  Say what you want about the rest of mongo. This is an area where it actually shines.
  
  anonzzzies a year ago
  
  Alright, I'll try it. Maybe I had the wrong tutorials or maybe these were old.
darby_nine a year ago

but then you wouldn't have the joy of using the most awkward query language invented by mankind
zihotki a year ago

But RDBMS'es are often also distributed. So what is mongo now?
- marcosdumay a year ago
  
  People don't usually distribute Postgres (unless you count read replicas and cold HA replicas). But well, people don't usually distribute MongoDB either, so no difference.
  In principle, a cluster of something like Mongo can scale much further than Postgres. In practice, Mongo is full of issues even before you replicate it, and you are better with something that abstracts a set if incoherent Postgres (or sqlite) instances.
  
  zozbot234 a year ago
  
  Postgres supports foreign data wrapper (FDW), which is the basic building block for a distributed DB. It doesn't support strong consistency in distributed settings as of yet, although it does provide two-phase commit which could be used for such.
  
  williamdclt a year ago
  
  > strong consistency in distributed settings
  I doubt it ever will. The point of distributing a data store is latency and availability, both of which would go down the drain with distributed strong consistency
  
  hibikir a year ago
  
  I think of the Stripe Mongo install, as it was a decade or so ago. It really was sharded quite wide, and relied on all shards having multiple replicas, as to tolerate cycling through them on a regular basis. It worked well enough to run as a source of truth for a financial company, but the database team wasn't small, dedicated to keeping all that machinery working well.
  Ultimately anyone doing things at that scale is going to run a small priesthood doing custom things to keep the persistence payer humming, regardless of what the underlying database is. I recall a project abstracting over the Mongo API, as to allow for swapping the storage layer if they ever needed to
- brabel a year ago
  
  Often?? In my experience it's really hard to do it and still maintain similar performance, which kind of voids any benefit you may be looking for.
- otabdeveloper4 a year ago
  
  Postgres doesn't do replication correctly.
  Mongo does. (In fact, replication is about the only thing Mongo does correctly.)
  If you actually want a replicated log then Mongo is a very good choice. Postgres isn't.
lkdfjlkdfjlg a year ago

> What makes mongo mongo is its distibruted nature, without it you could just store json(b) in an RDBMS.
Welllllllll I think that's moving the goalposts. Being distributed might be a thing _now_ but I still remember when it was marketed as the thing to have if you wanted to store unstructured documents.
Now that Postgres also does that, you're marketing Mongo as having a different unique feature. Moving the goalposts.
- thfuran a year ago
  
  It doesn't really seem reasonable to accuse someone of moving goalposts that you've just brought into the conversation, especially when they were allegedly set by a third party.
  
  coldtea a year ago
  
  Parent didn't "just brought them", they merely referrenced the pre-existing goalposts used to advocate for Mongo and reasons devs adopted it.
  
  lkdfjlkdfjlg a year ago
  
  Exactly this, very eloquent, thank you.
  Yes, I'm still bitter because I was one of those tricked into it.

mvsdiego a year ago

Nice.

Oracle has a similar library based documents/collections API named SODA, been around for years:

https://docs.oracle.com/en/database/oracle/simple-oracle-doc...

There are separate drivers for Java, node.js, python, REST, etc.

In addition to that, it has Mongo API, which is fully Mongo compatible - you can use standard Mongo tools/drivers against it, without having to change Mongo application code.

Both are for Oracle Database only, and both are free.

salomonk_mur a year ago

What would be the advantage of using this instead of simple jsonb columns?

joshmanders a year ago
It uses JSONb under the hood. Just gives you a very "mongo" feel to using PostgreSQL. Not sure how I feel about it.
```
    CREATE TABLE IF NOT EXISTS %I (_id UUID PRIMARY KEY, data JSONB)
```
- wood_spirit a year ago
  
  Can they make it use uuid7 for ids for better insert_becomes_append performance?
  
  lgas a year ago
  
  Yes
  
  oskar_dudycz a year ago
  
  Yes, I'm using JSONB underneath and translating the MongoDB syntax to native queries. As they're not super pleasant to deal with, then I thought that it'd be nice to use some familiar to many MongoDB API.
  Regarding IDs, you can use any UUID-compliant format.
lopatin a year ago

jsonb isn't web scale. Mongo is web scale.
- digger495 a year ago
  
  I see what you did there
imnotjames a year ago

Looks like it natches the mongo node API

aussieguy1234 a year ago

I'd like to know why AWS went with Aurora DB for their DocumentDB backend. Did the Mongo license change trigger a rush to build something Mongo compatible, but not quite MongoDB?

oskar_dudycz a year ago

You can check more in those sources: - https://techcrunch.com/2019/01/09/aws-gives-open-source-the-... - https://news.ycombinator.com/item?id=18871483 - https://www.enterprisedb.com/blog/documentdb-really-postgres...
TLDR: AWS didn't want to pay for licenses and rolled out their own thing.

sberder a year ago

This looks great, I'll definitely give it a try. As many mentioned already, having classic columns and a JSON(B) column seems to be a common solution. How do you handle data validation for the JSON documents? My current project uses Django for metadata. I've been thinking about creating a layer similar to model fields in Django. You would declare a JSON "model" through those fields and assign it to the actual model JSON field.

throwaway76324 a year ago
You can just specify a model/DTO object and serialize it as JSON when saving. Many frameworks do that automatically so you don't need to think about it. At work we just annotate the field in the model as a json-field, and the framework will handle the json-conversion automatically and store the other fields in the model as regular database columns.
pseudo code (to not trigger language wars):
```
   class Foo {
   
       @Id
       UUID id;

       String name;

       @Json
       MyCustomModel model;
   }
```
Adding fields is not an issue, as it will simply be missing a value when de-serializing. Your business logic will need to handle its absence, but that is no different than using MongoDB or "classic" table columns
- sberder a year ago
  
  That's a very low cost approach, I love it! I still think the Django ecosystem would benefit from a standardized/packaged approach including migrations. I'll ponder a bit more
oskar_dudycz a year ago

Thank you! I'm planning to add support to JSON schema and run the validation upon insert/update operation.

314156 a year ago

https://docs.oracle.com/en/database/oracle/mongodb-api/mgapi...

Oracle database has had a MongoDB compatible API for a few years now.

cyberpunk a year ago

And it only costs 75k a seat per year per developer, with free bi yearly license compliance audits, a million in ops and hardware to get near prod and all the docu is paywalled. What a deal!
- slau a year ago
  
  A client had a DB hosted by Oracle. The client was doing most of their compute on AWS, and wanted to have a synchronised copy made available to them on AWS. Oracle quoted them a cool $600k/year to operate that copy, with a 3 year contract.
  DMS + Postgres did it for $5k/year.
  
  cyberpunk a year ago
  
  Client of mine wanted to shift a rac cluster from some aging sparc gear into a VMware or openstack or whatever farm they had on premise; oracle demanded they pay CPU licenses for every single CPU in the cluster as each one could “potentially” run the oracle database, quoted them seven figures.
  They rewrote the app instead.
- 314156 a year ago
  
  Maybe you are unaware of this? https://www.oracle.com/cloud/free/

vmfunction a year ago

Hmmm how does this compare to https://www.ferretdb.com ?

How does this handle large files? Is it enough to replace GridFS? One of main attraction for MongoDb is it's handling of large files.

pipe_connector a year ago

MongoDB has supported the equivalent of Postgres' serializable isolation for many years now. I'm not sure what "with strong consistency benefits" means.

zihotki a year ago

Or is it? Jepsen reported a number of issues like "read skew, cyclic information flow, duplicate writes, and internal consistency violations. Weak defaults meant that transactions could lose writes and allow dirty reads, even downgrading requested safety levels at the database and collection level. Moreover, the snapshot read concern did not guarantee snapshot unless paired with write concern majority—even for read-only transactions."
That report (1) is 4 years old, many things could have changed. But so far any reviewed version was faulty in regards to consistency.
1 - https://jepsen.io/analyses/mongodb-4.2.6
- pipe_connector a year ago
  
  Jepsen found a more concerning consistency bug than the above results when Postgres 12 was evaluated [1]. Relevant text:
  We [...] found that transactions executed with serializable isolation on a single PostgreSQL instance were not, in fact, serializable
  I have run Postgres and MongoDB at petabyte scale. Both of them are solid databases that occasionally have bugs in their transaction logic. Any distributed database that is receiving significant development will have bugs like this. Yes, even FoundationDB.
  I wouldn't not use Postgres because of this problem, just like I wouldn't not use MongoDB because they had bugs in a new feature. In fact, I'm more likely to trust a company that is paying to consistently have their work reviewed in public.
  1. https://jepsen.io/analyses/postgresql-12.3
- endisneigh a year ago
  
  That’s been resolved for a long time now (not to say that MongoDB is perfect, though).
  
  nick_ a year ago
  
  I just want to point out that 4 years is not a long time in the context of consistency guarantees of a database engine.
  I have listened to Mongo evangelists a few times despite my skepticism and been burned every time. Mongo is way oversold, IMO.
- vorticalbox a year ago
  
  That is for mongo 4.x but latest stable is 6.0.7 which has note More resilient operations and Additional data security.
  https://www.mongodb.com/blog/post/big-reasons-upgrade-mongod...
  
  mmcclimon a year ago
  
  FWIW, the latest stable release is 7.0.12, released a week or so ago: https://www.mongodb.com/docs/upcoming/release-notes/7.0/. (I'm not sure why the URL has /upcoming/ in it, actually: 7.0 is definitely the stable release.)
throwup238 a year ago

> I'm not sure what "with strong consistency benefits" means.
"Doesn't use MongoDB" was my first thought.
danpalmer a year ago

MongoDB had "strong consistency" back in 2013 when I studied it for my thesis. The problem is that consistency is a lot bigger space than being on or off, and MongoDB inhabited the lower classes of consistency for a long time while calling it strong consistency which lost a lot of developer trust. Postgres has a range of options, but the default is typically consistent enough to make most use-cases safe, whereas Mongo's default wasn't anywhere close.
They also had a big problem trading performance and consistency, to the point that for a long time (v1-2?) they ran in default-inconsistent mode to meet the numbers marketing was putting out. Postgres has never done this, partly because it doesn't have a marketing team, but again this lost a lot trust.
Lastly, even with the stronger end of their consistency guarantees, and as they have increased their guarantees, problems have been found again and again. It's common knowledge that it's better to find your own bugs than have your customers tell you about them, but in database consistency this is more true than normal. This is why FoundationDB are famous for having built a database testing setup before a database (somewhat true). It's clear from history that MongoDB don't have a sufficiently rigorous testing procedure.
All of these factors come down to trust: the community lacks trust in MongoDB because of repeated issues across a number of areas. As a result, just shipping "strong consistency" or something doesn't actually solve the root problem, that people don't want to use the product.
- pipe_connector a year ago
  
  It's fair to distrust something because you were burned by using it in the past. However, both the examples you named -- Postgres and FoundationDB -- have had similar concurrency and/or data loss bugs. I have personally seen FoundationDB lose a committed write. Writing databases is hard and it's easy to buy into marketing hype around safety.
  I think you should reconsider your last paragraph. MongoDB has a massive community, and many large companies opt to use it for new applications every day. Many more people want to use that product than FoundationDB.
  
  daniel-grigg a year ago
  
  Can you elaborate on why ‘many large companies’ are choosing MongoDB over alternatives and what their use cases are? I’ve been using Mdb for a decade and with how rich the DB landscape is for optimising particular workloads I just don’t see what the value proposition is for Mdb is compared to most of them. I certainly wouldn’t use it for any data intensive application when there’s other fantastic OLAP dbs, nor some battle hardened distributed nodes use case, so that leaves a ‘general purpose db with very specific queries and limited indexes’. But then why not just use as PG as others say?
  
  azurelake a year ago
  
  I’d be curious to hear more detail about the FoundationDB data loss issue that you saw? Do you remember what version / what year that you saw it?
- nijave a year ago
  
  Have you looked at versions in the last couple years to see if they've made progress?
  
  danpalmer a year ago
  
  This kinda misses my point. By having poor defaults in the past, marketing claims at-odds with reality, and being repeatedly found to have bugs that reduce consistency, the result is that customer have no reason to trust current claims.
  They may have fixed everything, but the only way to know that is to use it and see (because the issue was trusting marketing/docs/promises), and why should people put that time in when they've repeatedly got it wrong, especially when there are options that are just better now.
  
  nijave a year ago
  
  Right, I was curious if you put even more time in :)
  I see lots of comments from people insisting it's fixed now but it's hard to validate what features they're using and what reliability/durability they're expecting.
- throwaway2037 a year ago
  
  > my thesis
  Can you share a link? I would like to read your research.
Izkata a year ago

> MongoDB has supported the equivalent of Postgres' serializable isolation for many years now.
That would be the "I" in ACID
> I'm not sure what "with strong consistency benefits" means.
Probably the "C" in ACID: Data integrity, such as constraints and foreign keys.
https://www.bmc.com/blogs/acid-atomic-consistent-isolated-du...
lkdfjlkdfjlg a year ago

> Pongo - Mongo but on Postgres and with strong consistency benefits.
I don't read this as saying it's "MongoDB but with...". I read it as saying that it's Postgres.
jokethrowaway a year ago

Have you tried it in production? It's absolute mayhem.
Deadlocks were common; it uses a system of retries if the transaction fails; we had to disable transactions completely.
Next step is either writing a writer queue manually or migrating to postgres.
For now we fly without transaction and fix the occasional concurrency issues.
- pipe_connector a year ago
  
  Yes, I have worked on an application that pushed enormous volumes of data through MongoDB's transactions.
  Deadlocks are an application issue. If you built your application the same way with Postgres you would have the same problem. Automatic retries of failed transactions with specific error codes are a driver feature you can tune or turn off if you'd like. The same is true for some Postgres drivers.
  If you're seeing frequent deadlocks, your transactions are too large. If you model your data differently, deadlocks can be eliminated completely (and this advice applies regardless of the database you're using). I would recommend you engage a third party to review your data access patterns before you migrate and experience the same issues with Postgres.
  
  akoboldfrying a year ago
  
  >Deadlocks are an application issue.
  Not necessarily, and not in the very common single-writer-many-reader case. In that case, PostreSQL's MVCC allows all readers to see consistent snapshots of the data without blocking each other or the writer. TTBOMK, any other mechanism providing this guarantee requires locking (making deadlocks possible).
  So: Does Mongo now also implement MVCC? (Last time I checked, it didn't.) If not, how does it guarantee that reads see consistent snapshots without blocking a writer?
  
  devit a year ago
  
  Locking doesn't result in deadlocks, assuming that it's implemented properly.
  If you know the set of locks ahead of time, just sort them by address and take them, which will always succeed with no deadlocks.
  If the set of locks isn't known, then assign each transaction an increasing ID.
  When trying to take a lock that is taken, then if the lock owner has higher ID signal it to terminate and retry after waiting for this transaction to terminate, and sleep waiting for it to release the lock.
  Otherwise if it has lower ID abort the transaction, wait for the conflicting transaction to finish and then retry the transaction.
  This guarantees that all transactions will terminate as long as each would terminate in isolation and that a transaction will retry at most once for each preceding running transaction.
  It's also possible to detect deadlocks by keeping track of which thread every thread is waiting for and signaling the either the highest transaction ID in the cycle or the one the lowest ID is waiting for to abort, wait for ID it was waiting for terminate and retry.
  
  akoboldfrying a year ago
  
  Yes, I'm aware that deadlock can be avoided if the graph having an edge uv whenever a task tries to acquire lock v while already holding lock u is acyclic, and this property can either be guaranteed by choosing a total order on locks and then only ever acquiring them in this order or, or dynamically maintained by detecting tasks that potentially violate this order and terminating them, plus retries.
  However, those techniques apply only to application code where you have full control over how locks are acquired. This is generally not the case when feeding declarative SQL queries to a DBMS, part of whose job is to decide on a good execution plan. And even in application code, assuming a knowledgeable programmer, they need to either know about all locks in the world or run complex and expensive bookkeeping to detect and break deadlocks.
  The fundamental problem is that locks don't compose the way other natural CS abstractions (like, say, functions) do: https://stackoverflow.com/a/2887324
  
  pipe_connector a year ago
  
  MongoDB (via WiredTiger) has used MVCC to solve this problem since transactions were introduced.
- threeseed a year ago
  
  > Next step is either writing a writer queue manually
  You can just use a connection pool and limit writer threads.
  You should be using one to manage your database connections regardless of which database you are using.

DonnyV a year ago

Would love a C# version of this. I usually use Mongodb for all of our projects. But we need to use Postgres for a project. This would come in very handy.

oskar_dudycz a year ago

You can check Marten, that I was co-maintaining: https://martendb.io/. It doesn't have MongoDB-compliant API, but it's mature, stable and efficient.

deanCommie a year ago

So DocumentDB? https://news.ycombinator.com/item?id=18870397

oskar_dudycz a year ago

Yes, a similar idea, but I don't aim to be 100% MongoDB compliant or full replacement. My goal is to use as many of PostgreSQL features as possible. Having the library level as translation will allow more scenarios like, e.g. sharing connection and using PostgreSQL hosting.

tracker1 a year ago

Has this been tested with CockroachDB or any other databases that use a mostly compatible PostgreSQL wire protocol and query language?

Tao3300 a year ago

Ditch the dalmatian before Disney rips your face off.

oskar_dudycz a year ago

Fair, I'll redraw it ;)

a13n a year ago

Does Pongo work with mongoose? I would guess most mongo users are using mongoose and supporting that library would drive more adoption.

oskar_dudycz a year ago

Not yet, but it's a nice idea. I added GH issue to track that: https://github.com/event-driven-io/Pongo/issues/13

rework a year ago

Looks sort of like MartenDB but trying to minic mongo api, unsure why anyone would want to do that... mongo api is horrible...

JanSt a year ago

Wouldn't that allow to switch from Mongo to Postgres without having to rewrite all of your app?
- oskar_dudycz a year ago
  
  Hint: I'm an ex-Marten maintainer, so the similarity is not accidental ;)
  As Op said, not needing to rewrite applications or using the muscle memory from using Mongo is beneficial. I'm not planning to be strict and support only MongoDB API; I will extend it when needed (e.g. to support raw SQL or JSON Path). But I plan to keep shim with compliant API for the above reasons.
  MongoDB API has its quirks but is also pretty powerful and widely used.
  
  rework a year ago
  
  Oh, so you are, then we can rest assured this will end up being a solid project!
  I personally can't stand mongodb, its given me alot of headaches, joined a company and the same week I joined we lost a ton of data and the twat who set it up resigned in the middle of the outage. Got it back online and spend 6m moving to postgresql.
  
  oskar_dudycz a year ago
  
  Thanks, that's the goal: to bring the solid and verified approach in Marten to Node.js land. The concept is similar, but the feature set will be different.

ramchip a year ago

Have you tried it with CockroachDB?

oskar_dudycz a year ago

I did not, but I'm not using any fancy syntax so far besides JSONB operators. If it won't work, then I'm happy to adjust it to make it compliant.

joeyagreco a year ago

Good work! I would like to see a section on the README outlining the benefits of Pongo

oskar_dudycz a year ago

Thanks, I'll try to cover that, good call!

posix_monad a year ago

Does MongoDB have serious market share compared to DynamoDB (and similar clones from Azure, GCP) at this point?

dudeinjapan a year ago

Totally. Many of the biggest tech companies are using for core use cases. Stripe uses a modified version: https://stripe.com/blog/how-stripes-document-databases-suppo...
We use MongoDB’s cloud offering called Atlas as our core DB at TableCheck.
maxdo a year ago

Mongodb and dynamodb are completely different dbs. One is unlimited scale KV but very expensive , another is document nosql db that sells you idea “it just works” for lots of features , indexes on anything , aggregation , time series . Vector DB, sharding , replicas etc . It’s a very powerful db for sure.

davidpfarrell a year ago

[flagged]

revskill a year ago

Genius.

oskar_dudycz a year ago

<3