Quick link: When it comes to causal knowledge, every little bit (of lowered entropy) helps

The paper: Not Causal or Descriptive But Some Secret, Other Thing: Entropy as a Criterion for Causal Learning

What it says: The paper offers the view that the difference between causal and descriptive studies, and between different concepts of causal modeling, isn’t a binary but a continuum, and that studying changes in the entropy of causal models can be a good way to map where you are and where you might need to go.

My take: He’s not wrong. I’d qualify his analysis with the observation that sometimes we care about the entropy of certain subsets of the causal model: there are variables we are very much okay with knowing very little about if what we know is enough to infer what we need to know (although that always suggests there are possibly interesting structures hidden there). And of course you always want to graph the posteriors and look at them; entropy is a good constraint but it doesn’t tell you the whole story. But, generally speaking, it’s a good thing to keep an eye on, especially to quantify localized changes to your model after some experiment.

Why you should care in the big picture of things: The technical approach is useful, but it’s mostly worth reading for the larger point that the goal of modeling is to build knowledge, and that there aren’t magical pipelines where you just throw in data on one end and get understanding out of the other (never mind text; half of the scientific revolution was based on the observation that human language is a very awkward way to describe or think about big parts of the world).

Building knowledge is a hard, careful, annoying struggle, and you can and should use every trick and resource you have, from abstract thought experiments to raw observations to careful experiments, to move forward an inch. What makes it work isn’t snobbery in your inputs but carefulness in your process: use everything and trust nothing, least of all yourself.

Statistical modeling methods, both conceptual and computational, are one of the best tools we have to do this sort of integration and make use of what we find. One of my main bets for the future — or rather one of my main suggestions for competitive advantage whether you’re competing against others or just against the stubborn limits of our minds — is a wide collective shift from good databases and ersatz analysis as the default mode of organizational knowledge building and usage to good databases and good analysis. The current spike of interest in LLM-based AI is, I think, strategically correct if operationally askew; the tools are wrong but the desire is now there (sort of), and sometimes the desire is the difficult part.