# Section 2

**Measurements are generally imperfect**. Let a system be specified by the value of a single parameter – call it a. This is a fixed and definite number (see below). Examples of a are the photon position and the cancer age described above. It is convenient here to use for a the ideal position of an electron.

Suppose that an observer who wants to gain information about the electron measures it for its value of a. As any measurement is, in general, imperfect, the measurement value is not a but some general number y. This number is also the “message” (as above) of the communication channel. Let the departure of y from a be the random value x, or,

y = a+ x. (3)

**Any such error x in the measurement generally consists of random instrument “noise” plus any fluctuation that is characteristic of the measured effect.**

Consider next a special measurement of a: This is by the use of an ideal or noise-free measuring device. In the absence of noise of detection, x is now a fluctuation that is purely characteristic of the measured effect. **It is these fluctuations x that concern us. They define the physics of the effect, and are given the special name “intrinsic” fluctuations**. With this promotion in rank, x is now no longer regarded as mere noise. For example, x could be the actual fluctuation in position or momentum of a quantum particle. Let the relative number of times a fluctuation x occurs be represented by a function p(x). This is called the “probability law” for x. Its square-root is an amplitude law that becomes the famous “wave function” ψ(x) in application to quantum mechanics. **The ultimate aim of this essay is finding the probability law p(x) (and its wave function).**

**The Fisher information value I for a typical such measurement y is defined to obey [2]**

I = < [(d/dx) log p(x)] 2 > (4)

This is the average < > over all values of x of the square of the rate of change d/dx of the logarithm of p(x). Why it should have this peculiar form is derived in either of refs. [1a-c]. The average < > in eq. (4) may be evaluated as a simple multiplication of the squared term by p(x) and integration over all x; for brevity we do not show this here (see eq. (11) ). For the moment we assume that p(x) is known, so as to observe how differently shaped laws p(x) give rise, via eq. (4), to different values for I.

In evaluating eq. (4) for various laws p(x) it becomes apparent that

**I is a measure of the width of p(x) (or of the amplitude law q(x)).**

For example, if **p(x) is a normal law** its use in eq. (4) gives I as simply 1 divided by the variance. The variance is roughly the squared width of p(x). Hence the wider the law p(x) the smaller the information value. But, what does the width of a probability law signify?

The wider or broader the probability law p(x) on the fluctuation is, the more “random” the values of x are. Therefore, the less accurately can parameter a be estimated from an observation y. We would expect this to define a case of low information I. As mentioned above, this is precisely what eq. (4) gives in this scenario.

Therefore, Fisher information I measures the information about an unknown parameter a that is present in a typical data value y.

Eq. (4) holds for the usual case where the probability law p(x) obeys shift invariance. This means that it stays the same regardless of the absolute position of the entire information channel in space-time. To review, the “information channel” is a rigid system consisting of a source – say, source of potential -, a measuring device, and an observer. Hence, under shift invariance, the probability law p(x) is to hold regardless of any shift that might be imparted to this rigid system. Applications of EPI have been made to both shift invariant and shift variant systems.

If shift invariance doesn’t hold, or if more generally a vector of measurements is made, a slightly more complex definition of I is used. This is defined to be the trace of the Fisher information matrix. (The trace is the sum of elements down the diagonal.) The trace form (not shown) also amounts to a “channel capacity” or maximized form of the Fisher information. Thermodynamically, this corresponds to an “unmixed state” of the measured system, i.e., one of maximum order. The EPI principle (1) utilizes precisely this type of information I. Finally, this trace form of I arises out of the “knowledge game” aspect of measurement, as below.

I as a measure of **disorder**

Consider a system with the intrinsic fluctuation x following the probability law p(x) (“intrinsic” meaning, as above, purely noise of the physical effect, not of the detector in use). For example, the system might be the above electron, located at a true position a but observed at a position y = a+ x, with x purely due to quantum fluctuation. The broader p(x) is the smaller must each p value be, since its total area must be fixed (at value unity). Therefore, the less predictable is each value of x; and hence of y. Aside from giving rise to inaccurate data y (as above), such a system is also said to have a high degree of “disorder”. Thus, the “width” of a probability law associates with the degree of disorder of the system it describes. On the face of it, this is a qualitative statement. But, in fact, **Eq. (4) allows the degree of disorder of the system to be quantified, i.e., represented by the number I**.

However, we usually measure the disorder of a system by its level of entropy H. Then does H relate to I? If the system is diffusive, i.e. its probability law p(x) obeys the Fokker-Planck differential equation, then the level of disorder of the system monotonically increases with time. Measuring as usual the level of disorder by the level of entropy H, the entropy must then increase (by a relation dH/dt ≥ 0 called the “Boltzmann H-theorem” or the Second law of thermodynamics.). In fact, correspondingly, **the Fisher information decreases [3,4]**:

dI/dt ≤ 0 (5)

This might be considered a “**Fisher I-theorem**“, corresponding to the **Boltzmann H theorem**. It follows that both measures I and H change monotonically with the level of disorder. Then the entropy H is NOT a unique measure of disorder, as has been erroneously taught for over 100 years. The Second law is described by Fisher I as well as by entropy H.

Eq. (5) also indicates the direction of an “arrow of time” [1a,b]. That is, if the Fisher information level of a system is observed to decrease, dI < 0, its history is necessarily moving forward, dt > 0.

The I-theorem also implies that the information efficiency constant κ of the EPI principle lies between 0 and 1, as will be discussed below eq. (14).

Finally, the I-theorem leads to a statement of the Second law that expresses how well things may be measured as time progresses; see below.

I as a measure of **complexity**

In a scenario of multiple PDFs pn(x), n=1,…,N the size of N is one determinant of **the complexity** of the system. Intuitively, the larger N is the more complex is the system. It usually turns out that the Fisher I is proportional to N. This indicates that **I measures the degree of complexity of the measured system**, aside from indicating as well its level of disorder (as above). This property has many applications [1a,b]. For example, minimal complexity is used as a defining property of cancer growth (see paper by Gatenby and Frieden below). The basis for this property is that **cancerous tissue can grow but no longer function properly, and hence has given up a large amount of complexity**. This property, when used in an application of EPI to cancer growth, gives rise to the correct law of growth for breast cancer; see prediction (8) below.

I as a measure of the **ability to know**

Information I also determines how well the position of a particle can be estimated by an imperfect observation of its position. The mean-squared error e2 in any unbiased estimate obeys [1,2]

e2 ≥ 1/I. (6a)

This is called **the “Cramer-Rao inequality”**. It shows that the larger the level of Fisher information is, the smaller tends to be the mean-squared error e in the estimate of the electron position. This satisfies intuition as well.

Eq. (6a) shows that the minimum possible value emin 2 of the mean-squared error obeys emin 2 = 1/I. (The minimum is attained by some optimum processing approach.) Differentiating this relation d/dt and using the Fisher I-theorem (5) gives

d(emin 2)/dt ≥ 0. (6b)

Thus, the direction or “arrow” of time is such that minimized (optimized) errors increase with it. Or, for diffusive systems,** error tolerances and instrument quality inexorably run down with time.** This is another way of stating the 2nd law of thermodynamics.

I as a measure of **value**

Quantity I is always information about something, and also has units. By comparison, Shannon information (SI) is a unitless measure. SI measures the ability to pass a variety of distinguishable signals through a channel. The signals can be of any type, and one cannot tell from a given value of SI whether, e.g., the channel was the human birth canal or the prices and sales on the stock market during a given day of trade. This leads to the well-known problem of interpreting the value of a given amount of SI (usually measured in “bits”, a fictitious unit since SI has no units.). One bit is one bit regardless of what event it actually represents (“a boy” being born, or collapse of the stock market, for example). Therefore the known level of SI for a channel cannot indicate the level of value of the channel to the observer. Theories of “value” of SI have been proposed over the years, but no universally accepted one has yet been found.

Information I, by comparison, is directly about something, namely, the numerical size of a required system parameter. This, in essence, solves the “value” problem. The “value” to an observer of an acquired value of I is something definite: It defines the observer’s average level of uncertainty in knowledge of the parameter, and in definite units (by eq. (6a), those of the reciprocal of the parameter-squared).

What is a **law** of physics? To what extent do information and knowledge relate to physics?

Physics is fundamentally tied into measurement. In fact:

(c) One may regard “physics”, by which we mean the equations of physics, as simply a manmade code that represents all past (and future) measurement as briefly, concisely and correctly as is possible. Thus **physics is a body of equations that describes measurements** (for example, the value of datum y in eq. (3)).

In fact we could equally well replace the word “physics” in the foregoing with “chemistry”, “biology,” or any other quantitative science. All describe measurements and potential measurements by manmade codes called “equations.”

(d) But measurements are generally made for the purpose of knowing, in particular knowing the state of a system. Thus, physics presumes definite states to exist. We characterize these by definite values of a parameter such as a above (for example a position or time value). A definite parameter value is presumed to characterize a definite system or “object”. (Taken the other way around, how could a definite parameter value characterize an indefinite or ill-defined system?)

Thus, by (c) and (d) the aim of physical measurement is to define as accurately as possible concrete objects that have a definite existence (see also a further discussion in section preceding eq. (16)). These objects have defining parameters – positions, energies, etc. – that have definite values. As we saw above by the Cramer-Rao inequality (6a), the Fisher information I governs how correctly we can estimate such a parameter.

This is obviously a classical view of things. Nevertheless it gives rise to all of science, including even quantum mechanics. (A proviso for the latter is that, as required by special relativity, certain time or space coordinates are to be represented as imaginary numbers, sometimes called a Wick rotation; an example is the Wick coordinate ict for the time.)

This suggests (altho does not prove) that :

**Ideally accurate versions of the laws of science should follow from maximally accurate estimates, and therefore by eq. (6a) maximum (note: not minimum) Fisher information.**

Notice that the entropy H has not entered this analysis. Entropy does not measure degree of accuracy. It measures something else, namely, the degree of degeneracy of a system. Or, as others have aptly noted:

**Fisher information measures how much information is present, whereas entropy measures how much is missing.**

At this point the development branches to the derivations of two fundamentally differing types of physical laws. One type is exemplified by the **Heisenberg uncertainty principles**. Thus, a general category of scientific uncertainty principles is derived in the next main section. A second type of physical law is an equation of dynamics. This defines the evolution of an effect over time and space (**a wave equation** is, e.g., of this type). **These are derived** starting in the section following the next..

**(e) The Cramer-Rao inequality (6a) governs how accurately any measurement can be made. Thus, it is a measure of our inability to acquire knowledge. Such an inability can be quantified as** the numerical uncertainty in knowledge of a parameter, as in an “uncertainty principle.” The Heisenberg is one example. Three such principles follow from particular uses of Fisher information I (also see refs. [1a,b]):

**Uncertainty** principles

(I) If the measurement is of the position of a particle, the information I turns out to obey the inequality I ≤ 4<μ2>/ħ2, where μ is the particle’s momentum and ħ is Planck’s constant divided by 2π. As usual, the notation < > means an average. Using this in eq. (6a) **gives the Heisenberg uncertainty principle**

e2<μ2> ≥ (ħ/2)2. (7)

The product of the left-hand side uncertainties in position and momentum exceeds a universal constant. Thus the principle naturally arises out of Fisher information considerations.

(II) Let the measurement instead be the random drawing of a biological organism of type n from a population consisting of organisms of types n = 1,…, N which have been evolving for an unknown time duration t. The organisms have respective fitness values (net number of offspring per generation) wn. Denote the overall variance in fitness over types n as <w2>. This generally varies with generation number, or time, as a function <w2 (t)>. Suppose that from observation of n the evolutionary age t of the population is to be inferred. It turns out that I = <w2(t)>, so that eq. (6a) gives

e2(t)<w2(t)> ≥ 1 (8)

governing the error e2(t) in the age. This is a new uncertainty principle, one of **“biological uncertainty”**, stating that the error e2(t) in the inferred evolutionary age t tends to be small if the variance in the overall fitness is large at that age. Such a principle was predicted (and long sought) by the celebrated biophysicist Max Delbruck.

(III) There are various measures of the **volatility v in price of a financial security.** Suppose that the fluctuations x in the sales price of the security follow a law p(x). (See item (9) below.) By eq. (4) this represents a certain level of Fisher information I. Then eq. (6a) states that the mean-squared fluctuation in the price of the security is at least as large as 1/I. Therefore, 1/I is a conservative measure of the uncertainty in the price. Therefore it represents a useful measure of its volatility v,

v = 1/I. (9)