On Having a Data Object

(natemeyvis.com)

36 points | by Theaetetus 5 days ago

7 comments

  • codemonkey-zeta 14 hours ago
    Author is on the verge of having a Clojure epiphany.

    > 1. You should often be using different objects in different contexts.

    This is because "data" are just "facts" that your application has observed. Different facts are relevant in different circumstances. The User class in my application may be very similar to the User class in your application, they may even have identical "login" implementations, but neither captures the "essence" of a "User", because the set of facts one could observe about Users is unbounded, and combinatorially explosive. This holds for subsets of facts as well. Maybe our login method only cares about a User's email address and password, but to support all the other stuff in our app, we have to either: 1. Pass along every piece of data and behavior the entire app specifies 2. Create another data object that captures only the facts that login cares about (e.g. a LoginPayload object, or a LoginUser object, Credential object, etc.)

    Option 1 is a nightmare because refactoring requires taking into consideration ALL usages of the object, regardless of whether or not the changes are relevant to the caller. Option 2 sucks because your Object hierarchy is combinatorial on the number of distinct _callers_. That's why it is so hard to refactor large systems programmed in this style.

    > 3. The classes get huge and painful.

    The author observed the combinatorial explosion of facts!

    If you have a rich information landscape that is relevant to your application, you are going to have a bad time if you try modeling it with Data Objects. Full stop.

    See Rich Hickey's talks, but in particular this section about the shortcomings of data objects compared to plain data structures (maps in this case).

    https://www.youtube.com/watch?v=aSEQfqNYNAc

    • bccdee 11 hours ago
      > Option 2 sucks because your Object hierarchy is combinatorial on the number of distinct _callers_.

      I kinda like that. Suppose we do something like `let mut authn = UserLoginView.build(userDataRepository); let session = authn.login(user, pwd)`. You no longer get to have one monolithic user object—you need a separate UserDataRepository and UserLoginView—but the relationship between those two objects encodes exactly what the login process does and doesn't need to know about users. No action-at-a-distance.

      I've never used clojure, but the impression I get of its "many functions operating over the same map" philosophy is that you trade away your ability to make structural guarantees about which functions depend on which fields. It's the opposite of the strong structural guarantees I love in Rust or Haskell.

      • codemonkey-zeta 6 hours ago
        > you trade away your ability to make structural guarantees about which functions depend on which fields

        You might make this trade off using map keys like strings or keywords, but not if you use namespace qualified keywords like ::my-namespace/id, in combination with something like spec.alpha or malli, in which case you can easily make those structural guarantees in a way that is more expressive than an ordinary type system.

  • oftenwrong 14 hours ago
    There is something to be said for having some basic data access libraries already in place, even if they are not ideal, so that developers can bang out functionality more quickly. That is the typical selling point of ORMs, isn't it? While there are well-known downsides, you can skip the ORM when it's not a good fit, or later when you realise that it is causing a problem.

    Generally, I prefer to create functions for specific queries, rather than for specific "entity" types, and the return type of each query matching the result of the query. This fits with the reality that queries often involve multiple entity types.

    My favourite application-later database tool so far is https://www.jooq.org/ because it allows for code generation from the database schema, allowing for type-safe construction of queries. I find this makes it easier to create and maintain queries. It is a relatively unopinionated power tool, with minimal attempts at "automagic" behaviour. I find myself missing jOOQ now that I am not working much with Java.

  • hexbin010 16 hours ago
    It's a trade off like everything. More DTOs means more mapping, more coming up with names, more files etc. There's definitely a middle ground.

    You can (should) also apply it selectively. Eg for auth, I'd never want a single UserDTO used for creating and displaying a user - creating a user requires a password, a field you don't want when retrieving a user, to avoid mistakes.

    I know DDD advocates would say that you're then not being true to DDD, but yes that's business. It's very very hard to get everyone to agree with the reduced velocity of 100% adherence to DDD for an extended period. In my experience it starts off as "this is great" then people start hate reviewing PRs for simple changes that have 28 new files (particularly in Java) and they quietly moan to the boss about being slowed down by DDD

  • rokkamokka 10 hours ago
    In my experience (in our rather large MVC-style Laravel code base) DTOs are almost always an unnecessary abstraction. I'm much more content just shuffling actual Models around, with small methods that map these to whatever format the client then requires. I've refactored away many a DTO added by junior developers and the code is always much simplified.
  • Noumenon72 9 hours ago
    I'm not sure I understand the different approaches being compared here. Are you opposing 1 and supporting 2?

      1. HatsService with methods .get_hats(), .throw_hats()  
      2. wearable_hats = db.query(Hats).map(hat => WearableHatDto(hat)
         throwable_hats = db.query(Hats).map(hat => ThrowableHatDto(hat)
  • reillyse 16 hours ago
    The fact that people expect a data object, as argued by the author, is a very strong argument in favor of having one.

    Onboarding new programmers to your codebase and making the codebase simpler for developers to reason about is a massive non-functional benefit. Unless you have a very strong reason to do things otherwise, follow the principle of "least surprise". In fact vibe coding adds another layer to this - an LLM generally expects the most common pattern - and so maintenance and testing will be orders of magnitude easier.