1

My question stems from this interesting accepted answer posed by @causative, a snippet of which is:

An explanation is analogous to data compression. We have a large amount of data, that we explain with a short, simple rule. For example, you may plot a lot of (x, y) points and draw a regression line y = Ax + B through them. The regression line can be described just by two numbers, A and B, even if you have thousands of (x, y) points; we have compressed the data (lossily), and also partially explained it.

The laws of physics are a few simple equations that describe the behavior of many different phenomena. They are data compression as well; it is much simpler to write down the equations than to write down all the details of the phenomena they describe.

The more fundamental the laws, the greater the compression, and the more fundamental the explanation is.

After some digging, I found a paper that essentially outlines a similar concept. A snippet of which is:

Unpacking this slogan, what it says is that the best explanation of the fact s (i.e., of some data) is the shortest. Given some data that you want to explain, the Principle of Minimum Message Length tells you to infer the theory which can be stated with the data in the shortest two-part message, where the first part of the message states the theory, and the second part of the message encodes the data under the assumption that the theory is true.

After some self reflection, this is probably accurate and seems to go along with how the history of science has behaved. However, I am interested in counterexamples or whether any philosophers have advocated for the notion of an explanation that doesn’t compress data.

Presumably, this involves computer science concepts, and many philosophers may not use computer science concepts to characterize explanations, but is there an example of a valid explanation that does not effectively analogize to data compression (i.e. an explanation that does not somehow shorten the data we are trying to explain)?

2
  • 3
    You just peruse all accepted answers in this site which are longer in length than their respective questions... Commented 17 hours ago
  • 2
    For certain- just consider the reams of guff written about time, for instance. Commented 14 hours ago

7 Answers 7

2

Depends how one defines size (in other words, what attribute the compression would reduce), what one considers an explanation, and which explanation you’re dealing with. Suppose one describes the set

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}

with the explanation

The positive integers less than 16.

Then while the data (the set enumeration) weighs in at 37 characters (even omitting the spaces), the explanation is a mere 34 characters (even counting the spaces). And if the data set were to grow to include more and more of the consecutive positive integers, then the correspondingly adjusted explanation would be ever more of a compression.

But another explanation of the same data set—and one that in some circumstances would be considered a “better” explanation—is

The roots of the polynomial x15 – 120x14 + 1,365 x13 + … + 1,307,674,368,000,

where that polynomial is the product (x – 1)(x – 2) … (x – 15). That explanation, once you restore all twelve of the terms that I’ve replaced above by the ellipsis, is way, way longer than 37 characters (even with the unexpanded ellipsis—so in this incomplete form—it’s already well more than 50 characters). So I don’t see how one could consider it a compression.

(Note that it is a good explanation in exactly the same way that a good answer to the question Where did the numbers 3 and 5 come from in this situation? might well be Oh, the situation corresponds to the expression x2 – 8x + 15, and they’re the values of x at which that expression equals 0.)

5
  • Can you not just directly reverse this example tho. Your explanation is directly swappable, with the information provided. I feel like the explanation must translate somehow, the information provided but I'm not sure it can be easily identified as doing so. Commented 2 hours ago
  • Well @djsmiley2kStaysInside, I constructed a very simple example, so yes, the data set seems as intuitively “nice” as does the first explanation of it. But that is a side-effect of my example’s simplicity. In general, explanations of finite sets of real numbers in terms of the roots of polynomials seem to deserve that label, explanations, much more than would finite sets of real numbers as (putative) “explanations” of the roots of polynomials. Commented 2 hours ago
  • I doubt people would consider “the subtraction of the expression 10 - 9” as a good explanation of the number 1. Rather, they would just say that they are equivalent expressions in math. But that sounds similar to your polynomial example.
    – Syed
    Commented 1 hour ago
  • @Syed, that’s what I meant by “the situation corresponds to…”. The observed data might be that there is exactly one beer left. And the explanation could be, “Oh, we originally had ten beers, but each of nine people took one beer.” That’s a situation corresponding to the expression 10 – 9. And in that situation, “10 – 9 = 1” is a perfectly good explanation of the data. Commented 54 mins ago
  • @PaulTanenbaum fair enough, the real world example you pointed out seems to be a good counter example. A preceding event in the causal history of an event certainly counts as an explanation and there is no guarantee of that explanation summarizing the subsequent data
    – Syed
    Commented 52 mins ago
2
  1. Each basic differential equation of a successful theory from physics allows to explain and to forecast infinitely many data which have been observed or will be observed. Examples are Newton’s differential equation about force and acceleration, Maxwell’s equation of electrodynamics, Einstein field equation of General Relativity, Schrödinger’s equation from quantum mechanics, Dirac's equation of the electron.

    In that sense, the differential equation together with specific boundary conditions compresses the data.

  2. IMO that can be understood without much ado.

2

You seem to be conflating an explanation of an event with a theory of an event type. An event is anything that happens in the physical world. An event type is a pattern of events that have something common. For example, a specific bullet fired from a specific gun at a specific time is an event. There are various event types of which this event is an instance. In increasing order of generality, here are some of them:

  1. Any bullet fired from that gun.
  2. Any bullet fired from any gun of that model.
  3. Any bullet fired from any gun.
  4. Any projectile moving through earth's atmosphere.
  5. Any object in motion with respect to another object.

A theory of an event type is a mathematical (or at least measurable) description that describes any event of that type within certain bounds of accuracy and assuming nothing else affects the event type that is not covered by the theory. A theory of an event type can be analogized to a sort of compressed expression of an approximation of data that has been gathered on that event type, although there is really more to it than that. A theory by it's nature assumes that there is a cause of some sort behind the event, one which assures that future events will follow approximately the same course as past events.

The theory is not an explanation in the intuitive sense, though; it is only a description of how the event typically goes. Calling it an explanation would be like answering "Why did Jack and Jill go up the hill to fetch a pail of water every day for the last year?" with the explanation, "Because Jack and Jill always go up the hill to fetch a pail of water every day." That's not an explanation in the usual understanding of the word; it's just an expression of the existence of a pattern. An explanation would be something like, "Because the water in the valley is too contaminated to drink."

However, that's not the whole story. A nomological explanation is one that is based on laws of nature (as opposed to say, teleology or human intention). All of the modern physical sciences are based on nomological explanations, and probably the large majority of scientists believe that all of science should be based on nomological explanations. There is a case to be made that ultimately the only sort of nomological explanation we can give is essentially the equivalent to a theory of event types. That is, the only sort of explanation that we can give is to say "this is how nature is observed to act."

One can argue that the limitations of the human condition mean that all we can do is observe patterns and construct theories based on those observations, so although we may want a deeper explanation than merely expressing the existence of a particular pattern of events, we may have to settle for no more.

2
  • You had me at, "this is how nature is observed to act." Seems pretty final.
    – Scott Rowe
    Commented 4 hours ago
  • The “Because the water in the valley is too contaminated to drink." explanation comes from a theory though, no? We have a theory behind how contamination can cause people to be infected and die that is based in biology or germ theory. Without this theory, the quoted explanation would not be so intuitive
    – Syed
    Commented 1 hour ago
1

Do explanations ever not compress the data they explain?

An explanation is always a type of summary of events. In most instances, the explanation requires less analysis than the raw data. This "compression of data" is the basis of Occam's Razor.

1

The english concept of "explanation" almost maps to compression, but not exactly. For example, Let's choose our data to be a very large randomly generated set of planar graphs, and labels of the graphs based on whether they can be colored with only four colors. The graphs themselves can't be compressed because they are randomly generated, so all effort must go into explaining the labels.

Explanation A: "All are labelled True"

Explanation B: the famous 600 page computer generated proof of the four colored theorem + "Therefore, all the graphs are labelled True"

Explanation C: the 20 page human written program that generates the proof of the four colored theorem + "Therefore, all the graphs are labelled True"

Explanation D: the 1 page program that enumerates all programs that generate proofs, procedurally checking if they prove the four colored theorem until one succeeds + "Therefore, all the graphs are labelled True"

All explanations compress the data.

Explanation A does the best job compressing the labels, but it feels like Explanation B is better than A in some ineffable sense of "actually" being an explanation.

Explanation C is clearly better than B, since it contains the same information in less space, and if you had C and wanted B, you could just run the program.

Explanation D is "better" than C for the same reason: It's shorter, and if you wanted the proof and not the program that generates the proof, just run the program. However, clearly D is unsatisfactory. One reason it's unsatisfactory is that C generates the proof in a few days while D takes longer to run than the age of the universe. Is this quantitative difference in run time enough to justify a claim of qualitative difference, that D isn't an explanation even though C is?

-1

This might be the reason why we cannot (yet) explain the content of the human brain. Jeff Lichtman in a recent podcast said (I am paraphrasing here) that we can make an all encompassing model of the brain, a connectome, but we can't understand it because we can’t compress it more than it already is. It’s like a dictionary.

Similar ideas exist around LLMs: An LLM model such as llama-8b is roughly 4gb large, but it contains basically all written knowledge humanity has ever created.

I think these are possible candidates for complex systems that cannot be explained by compressing the data (regarding the content and function of the brain) more than it already is.

-1

Your use of "compress information" limits you to only considering situations where one starts from information. I think lots of explanations don't fit this pattern, especially those explaining physical phenomena. They instead fit a peculiar pattern of claiming that all future observations will fit the pattern. This is an interesting corner case where to see "compression" we need to consider future observations.

However, if we just stick to the realm of pure mathematics, where the concept of "compression" applies, I think we can find counterexamples. This is particularly true in the world of proofs. In this realm it is the norm that the explanation, in form of a proof, is quite a lot longer than the original statement. There's quite a lot of interesting content on this:

  • Fermat's Last Theorem can be written in 215 characters (as phrased on Wikipedia) but the proof by Wiles is 192 pages long
  • The Four color theorem from graph theory was proven using a computer. The proof includes explicit searches of 1482 classes of graphs that each got their own proof by a computer. At the time this was done, computer aided proofs were not trusted, so had to be checked for correctness by hand: all 400 pages of microfiche.

Looking at these examples and others, I think we can identify a class of "explanations" which are not always compression. Why do we value these above explanations? We value them because they are airtight. It wasn't so much that we made them smaller (compression), its that we reframed them upon another structure that we trusted more. I would argue this is a class of thing we call an "explanation" which is not obliged to be shorter.

The mathematical examples are interesting because it is possible to rephrase them such that we think of it as "infinite" compression. To use the first example, it shows the theorem is true for n=3, n=4, n=5, n=6, ... n=infinity?. But in reality, we never had an infinite amount of data. We had a conjecture, phrased in the language of mathematics and infinity, and perhaps a collection of cases hand-tested as a sanity check to make sure the pattern holds.

This also points to an interesting corner case: non-constructive proofs. In the case of non-constructive cases, it is often impossible to prove something without exhausting the space, yet we can prove something interesting must exist in that space. This is interesting to me because I'm not entirely sure how one would choose to measure information in this case. Were one to find the counterexample, we could use it to measure our quantity of information, but it never actually gets found.

3
  • Interesting but I wonder if proofs in math even count as explanations. You have the axioms of math and everything implied the axioms of math. In a “proof”, what you’re ultimately doing is finding out if certain things are implied by the axioms of math that we don’t already know about. It’s more of a “discovery” rather than an “explanation”
    – Syed
    Commented 1 hour ago
  • @Syed Is one not discovering a relationship found in the data in the examples of explanation you gave in the original question? (it is a semantic detail, but it;s worth pointing out that if you're looking for whether or not there are counterexamples, one has to be very careful with the definition of the thing. Otherwise you no longer have a question about "explanation," you have a definition of it)
    – Cort Ammon
    Commented 1 hour ago
  • this does get into the semantics of what ultimately counts as an explanation, and semantics can unfortunately always be debated @Cort Ammon
    – Syed
    Commented 1 hour ago

Not the answer you're looking for? Browse other questions tagged .