Section 4
Extreme Physical Information (EPI)
The following is a sketch of the derivation of the EPI principle. The full derivation is in B.R. Frieden and R.A. Gatenby, “Principle of maximum Fisher information from Hardy’s axioms applied to statistical systems,” Phys. Rev. E 88, 042144 (2013) for details). It is based on L. Hardy’s recent papers showing that all modern physics follows from 5 mathematical axioms. The key axiom for our purposes is that the number N of distinguishable states attainable by a system is a maximum. Next, consider for simplicity a one-dimensional system of coordinate x with fixed total length L. Let its ultimate resolution length be of small size δx. Then the number of distinguishable states x is N = L/δx. Then by the Hardy axiom L/δx = maximum. Now, the ability to distinguish states is governed by the Cramer-Rao inequality, according to which δx2 ≥ 1/I with I the Fisher information. Then by the preceding 3 equations
max. = N = L/δx ≤ LI1/2. (13)
Since L is fixed, with N a maximum then so must I = max. This in itself is a powerful result. It says that any physical system of finite size tends to have maximum Fisher information I. The specifics of the maximum depend upon whatever constraints are imposed by the system effect (e.g. the Schrodinger wave equation) that governs its output data. We’re halfway there. Information J is next considered.
The system, call it A, is observed, or otherwise interacted with, by another system that ‘coarse grains’ it. This means that some information from A is lost, or δI ≤ 0. As usual in variational derivations let the magnitude of δI be small. Define J as the level of information that is intrinsic to system A. As such, J is the maximum possible level of information that could ever be collected from A. Now, due to the coarse graining the actual level of collected information I = J +δI. Then since, as we saw, δI ≤ 0 it must be that I ≤ J. But, by Eq. (13), I is maximized. Then it must be that J (which is greater than I) is likewise maximized. Finally, since both I and J are maximized, and their difference δI is small, or minimal, it must be that
I – J = minimum. (14)
The minimum is attained through variation of the shape of p(x). It can, more generally, be either a maximum, a minimum, or a point of inflection. Eq. (14) is the extremum principle we use to find solution probability laws p(x). It is also the first part (called axiom 1) of an overall principle called that of “extreme physical information” or EPI. The naming comes from regarding the information change I – J as a new information denoted as K and called the “physical information”. It is alternatively called the “Kantian” (see below).
Inadequacy of the conventional Lagrangian energy action approach
There is an old adage regarding misleading first impressions. The principle (14) resembles the usual Lagrangian action approach to physics, with the actions superficially renamed “informations.” However, as we have discussed in depth, this is certainly not the case conceptually. But, how about practically? Is there anything to be gained by regarding the terms as informations? Why bother with this unconventional interpretation?
In the action approach, the Lagrangian is the difference of two action quantities. These are to be based upon the concept of energy – for example, kinetic and potential energies in classical mechanics. However, a problem arises. In many applications of Lagrange action, such as quantum mechanics, “the function whose integral is to be stationary is actually of a rather intricate and artificial character” [15]. To justify it requires re-defining what we mean by kinetic and potential energies. Also, such Lagrangians are formed merely so as to satisfy a known differential (say, the SWE). These are notably not unique since a gradient term of variable form may be added to any Lagrangian without changing its resulting Euler-Lagrange solution. Hence the Lagrangian does not have a straightforward physical meaning in quantum mechanics, and not always straightforward to apply in even basic physical problems.
By contrast, in the EPI principle (14) the Lagrangian does not explicitly depend upon energies and, hence, does not require us to contrive energy-dependent terms. This is fortunate, since non-energy based Lagrangians occur commonly, e.g. in econophysics (Hawkins and Frieden, 2004), population dynamics (Frieden et al., 2001) and cancer growth (Gatenby and Frieden, 2000). Indeed, progress in forming Lagrangians in these fields was held up precisely because investigators had limited their search to energy-dependent action terms (of which none could be found). There is no doubting the law of conservation of energy, but, for purposes of forming Lagrangians over diverse fields of science the concept of information has much more application than that of energy. The basis for these applications is again that science generally depends upon observation, and observation arises out of a flow of information. Of course, observation arises out of a flow of energy as well, but that flow has proven to be not as easily related to observables as is the flow of information.
It should be further mentioned in this regard that conventional Lagrangians require, for their validity as energy-based actions, systems where energy is conserved [17]. This rules out nonholonomic and nonconservative systems, such as occur for open systems or for dissipative systems. Even the Rayleigh dissipation function that is introduced in classical mechanics is only a phenomenological description of dissipation, and does not provide an energy-based Lagrangian theory [16]. If such systems are describable by Lagrangians, the Lagrangians are not identifiable as energy terms.
Regarding dissipative systems, the whole subject of thermodynamics has been traditionally judged by physicists to be out of bounds for a Lagrangian approach [15]. A notable exception was the celebrated Helmholtz [15]. Our contention is of course that he was right.
In fact, the biologist Bertalanffy demanded that the Lagrange approach be generalized to include systems of all kinds — open or closed, living or inanimate [15]. In rebuttal, the authors of [15] declare that such a generalized approach must (by virtue of its generality) be non-physical, i.e. not correspond with empirical data. What they are apparently saying is that there is no one physically meaningful quantity that could be used to form all these Lagrangians. Yet, as we have found, that quantity exists, and is the Fisher information (4). The EPI approach (14) that results centers upon the acquisition of empirical data, and contains an information term J that describes a vital physical property of the given system. Regarding irreversible systems specifically, Bertalanffy is in fact vindicated by A. Plastino and co-workers, who have explicitly used the EPI Lagrange approach to develop thermodynamics. This is without recourse to entropy (see numerous recent papers on web). We believe that the EPI approach is the generalization Bertalanffy anticipated.
Another important difference between the EPI principle (14) and the usual Lagrangian action approach is that the EPI principle arises physically, out of a real perturbation of the subject of the observation by the probe particle (as discussed above). By comparison, the Lagrangian energy-action approach arises as purely a mathematical property, owing to an imaginary displacement of the system (akin to the expression of D’Alembert’s principle of virtual work). Whereas EPI describes an actual physical event, the usual Lagrangian action approach describes a hypothetical event.
To paraphrase the preceding, Lagangian action theory is a mathematical theory of physics that arises from a purely mathematical criterion, whereas EPI is a mathematical theory of physics that arises out of physical measurement. EPI is not just mathematics but, rather, has reality. It describes a real physical process. It is also a worldview, as discussed in closing sections below.
Levels of accuracy. Central role of the invariance principle
As was mentioned below eq. (1), one does not generally acquire knowledge out of nothing. In particular, each use of EPI requires some prior knowledge about the information source or effect. The prior knowledge takes the form of either an invariance principle or empirical inputs. (In comparison to the standard Lagrangian action approach to physics, the invariance principle is used actively in EPI, as a postulate or ansatz, rather than passively as an output of the theory.) As a consequence, solutions to the EPI Eq. (14) are of three possible types, corresponding respectively to three alternative scenarios of prior knowledge:
(i) The source information J can be expressed in a physically meaningful space (such as conjugate momentum space for quantum effects) where it maintains the same information “length” value I. This particular invariance is called unitarity. It is allowed in cases where eq. (4) segues into a simple sum of squares, eq. (11), and therefore has a well defined length. The conjugate space is also called the unitary space to data space.
(j) A physically meaningful invariance principle exists that permits Eqs. (14) and (16) (below) to be solved simultaneously. Examples of such principles are continuity of flow, invariance to magnification, etc. See [1a,b] for details.
The preceding approach (i) gives exact answers. By comparison, approach (j) usually gives inexact answers, as indicated by a scenario where I < J. That is, information is lost so that the resulting theory cannot be perfect.
(k) If neither a unitary transform space (i) nor an invariance principle (j) can be found, an empirical approach may be taken. This is of course inexact. The observer simply sets J = 0 (zero). In place of J, known constraints on the unknown PDF are used [1b]. These are empirical inputs. This describes, e.g., the purely “technical” approach to investment often taken in econophysics (Hawkins and Frieden, 2004). Such use of arbitrary, empirical constraints is obviously Bayesian in nature, i.e., it brings in the biases of the observer. Hence it is an approximate use of EPI. Serendipitously, the same approach also serves to derive general non-equilibrium statistical physics (Flego et al, 2003; Frieden et al, 2002)).
It is fascinating that the three approaches (i), (j) and (k) defining descending levels of knowledge were actually anticipated by the philosopher Charles Peirce over 100 years ago. (Peirce is also the father of the familiar term “pragmatism,” the concept of the “gedanken experiment,” and the idea of the observer and nature as players in a game.) These levels of knowledge correspond, respectively, to the three attributes he called [1c] “Abduction”, “Deduction” and “Induction.” Thus, the exact and most fundamental level (i) is Abduction (firstness). At a slightly lower level, approach (j) corresponds to Deduction (secondness), i.e. a qualitative or conceptual understanding of the effect that is deduced from some abduction. Therefore a Deduction is not necessarily a primal truth. Finally the lowest level (k) corresponds to “Induction” (thirdness), which allows otherwise qualitative knowledge to be at least partially quantified by empirical knowledge.
Escher-like property of creative observation
The solution p(x) to the EPI principle (14) is also the PDF (probability density function) defining the measured data. It results that carrying through a measurement elicits both the measurement and the probability law p(x) from which the measurement is a random sample. Or,
EPI has the Escher-like property of defining the observation, on one hand, and the physics of the observation, on the other.
What’s more, since the progression of eqs. (12) to (14) defines a physical process as well, the physics is defined as it is observed. The activity of carrying through a measurement is self-realized or autopoietic. This is also called creative observation. Creative observation consists of:
(l) observed real measurements of a physical effect, and
(m) a known theory of the effect.
Actually, the full EPI solution requires both outputs (l) and (m). The output (m) is a purely analytical, general solution to the problem whereas (l) is a particular output from (m). Together, these completely describe the physical process. The analytical solution (m) has the form of a differential equation whose initial conditions are left unspecified (e.g., the ordinary Schrodinger wave equation). Thus, it is an incomplete solution until the initial conditions are provided by inputs (l) (Refs. [1a or b], Chap. 10). This completes the solution, in that the future statistical behavior of the effect can now be predicted numerically.
Does EPI create, or merely derive, physical objects and effects?
In this section we distinguish between objects, such as a mass particle, and physical laws that are obeyed by the objects such as the Schrodinger wave equation that is obeyed by the mass particle. In general such objects are the sources of effects, such as sources of mass (as in the preceding), charge, current, etc.
The EPI solutions derive physical laws. These arise out of analyzing measurements, and the latter emenate from physical objects. Does, then, making the measurements somehow create both the physical objects and their laws. That is, (α) do the physical objects and laws exist “out there” as fixed entities, with their natures unknown until these are derived mathematically by EPI, or (β) are they both created by EPI?
This question forces us to confront the question, what constitutes “reality”? One extreme view is that physics does not exist until it is requested by carrying through a measurement. This also seems to be justifiable out of a strict Copenhagen or positivist view of measurement : that nothing truly exists, either in substance or in theory, but observations. This view backs up the above notion (β) that the physical objects and their laws are created by the measurement.
At the other extreme is that substance and theory do pre-exist the measurement, and are ever ready to show their existence at each requested measurement. That is, each measurement simply reverifies a pre-existing reality. This view backs up the above notion (α) that the physical effect pre-existed its measurement.
An in-between view is that substance pre-exists the measurement, but its theory does not. This is in fact the view provided by EPI.
Consider the opposing view that an object does not pre-exist its measurement. If this were the case, how could it be specified (as above eq. (3) or at point (b) preceding) as having a definite parameter value? That is, after all, the basic premise of Fisher theory (which is classical). Also, if it did not have a definite existence why would the same parameter value exist from one observer to another, and why necessarily would both observers derive via EPI the same effect? And finally (a Godel-type argument), if the statement “nothing exists but measurement” is valid, then this statement itself does not exist, negating itself. From these considerations, we conclude that the object pre-exists its measurement. Hence, physical objects are presumed to pre-exist their observation.
These objects are postulated to have the primitive attributes of
length, time, mass-energy, and e.m. charge.
(The various potential energies emerge from EPI as particular probability amplitudes. Force fields are defined, as usual, as gradients of these.) EPI is epistemic in nature. It is an aim of EPI to discover the laws that these primitive attributes mutually obey, for example the well-known dynamical law E2 = c2P2+m2c4 connecting mass m, energy E and momentum P. (This law is obtained during the EPI derivation of quantum mechanics [1a,b].) Such laws are often called holonomic constraints. Of course, to find such law, it is first necessary to know that it exists a priori, i.e. is valid in all physical scenarios. This is consistent with the knowledge acquisition viewpoint of EPI. As another example, the parameter a in eq. (3) has likewise to be assumed to exist before an attempt is made at measuring it (also see below). EPI assumes that the observer “knows what he doesn’t know.” That is, he admits ignorance, but is not stupid. The EPI user thereby operates according to a maxim of Thomas Carlyle –
“A man doesn’t know what he knows until he knows what he doesn’t know.”
This begs the question of how the user would know that the holonomic law exists. This ultimately depends upon the quality of the intuition of the user. If his intuition is valid, so that he correctly suspects that a holonomic law exists, then his use of EPI will give correct results. If not it will be incorrect. However, this is a self-correcting process, since ultimately the predicted result is subjected to observation. This decides on the truth.
Notice that the objects of observation are not presumed to have some detailed structure like being closed or open strings (see next section), or “Cooper pairs” (Higgs mass effect). This lack of the need for specific particle models or structure is one of the strengths of the theory.
The other aspect of the question – whether the theory pre-exists its derivation – can be answered as in the preceding. That is, EPI is defined as being fundamentally epistemic, a learning process. It is a means by which an “ideal” observer who has no previous knowledge of the theory in question can learn, or derive, it. The “school room” for this learning process is a universe in which only the observer and the object exist. Hence, for this isolated observer with no prior knowledge the theory does not exist prior to the analysis. He creates it as he derives it. Furthermore, since such an observer by definition has zero memory of the effect, he creates it anew each time he derives it. For example, the Schrodinger wave equation is re-created each time EPI is applied to a position measurement.
Perhaps an equally important question is how the particle itself “knows” what physical law it is to obey in generating the datum demanded by the observer. A particle has no memory, so it can’t “recall” what law it obeyed the last time it was observed. Hence it cannot obey a “pre-existing” law. The law must be created anew each time the particle is observed. This is the law that EPI produces.
The EPI view of measurement
A related question to the preceding is, what does the wave function of a system actually represent? These two questions are taken up in Ref. [1a,b], Sec. 10.9 and summarized here.
One school states that the wave function represents the state of the system under observation, while another states that it represents the state of uncertainty of the observer of that system. The first view is ontological, the second epistemological. The EPI view is the following. Consider the wave function ψ+(x, tmeas) governing the state of knowledge of the observer of a system whose intrinsic fluctuation is amount x at a time of measurement tmeas. At this time let a measurement fluctuation xmeas occur, due to the nonideal nature of the measuring instrument. The latter is defined by the finite width of the instrument’s point amplitude response function w(x), at that time. The observer’s wave function ψ+(x, tmeas) relates, in fact, to both the intrinsic wave function ψ0(x, tmeas) of the system and the measuring instrument’s response function. The relation is quite simple,
ψ+(x, tmeas) = Cψ0(x, tmeas )w(xmeas – x). (15)
(This result is originally due to J. Von Neumann and D. Bohm.) Thus, the observer’s state of knowledge ψ+(x, tmeas) at the measurement is not just that of the wave function ψ0(x, tmeas) of the system. Rather, the latter is modified by simple multiplication by the instrument’s response function. This acts to degrade the knowledge, although the product is generally narrower than either constituent, as the use of Gaussian wave functions will readily show. This means that the observer’s uncertainty is reduced after the measurement, so that some finite amount of Fisher information about position is so acquired. This was of course the overall aim of the measurement. In summary, the observer’s state of knowledge after a measurement is defined as the product of the state of the system and the state of the observer’s measuring instrument. The narrower (higher quality) is the response function w of the instrument the narrower is the product, and the more completely does the observer’s state of knowledge equate to the state of the system.
Next we return to the original question, of whether laws of physics exist, i.e. execute, even without our observation of them. The same theory in [1a,b] that gives rise to eq. (15) also shows that in-between measurements the system’s wave function ψ0 evolves via the Schrodinger wave equation as a kind of background effect. Therefore the EPI view does not take the extreme positivistic view that nothing exists until we measure them. Rather, things continually occur in the unobserved background universe as well. Evolution of the wave function proceeds even without our viewing it. This is of course comforting to a conventional view of reality.
However, this puts us in the peculiar situation of having to admit that nonhumans can initiate EPI as well. By the EPI principle (14), (16), all laws of physics arise out of the perturbations caused by measurements. Therefore, all existing background evolution, even prior to the existence of humans, must have been likewise initiated out of observations. But who or what made it?
There is a simple way out of this dilemma. Before life existed, what we are calling a “measurement” was just an interaction between particles. The “probe particle” existed and perturbed the “subject,” but in the absence of any observer the resulting data could not be collected in a memory device as now. The data simply were particle events that, in turn, caused other particle perturbations and evolutions to arise and continue as well, unheeded by man.
What probability laws are amenable to derivation by EPI?
The preceding section was partly concerned with the question of what parameters are appropriate for finding by EPI. The current section takes up the corresponding question for probability laws.
The intended aim of EPI is the derivation of probability laws of the general form p(x1, x2, …, xM). Here (x1, x2, …., xM) are the random variables of the problem. The EPI approach also attempts to gain knowledge of any particular relations that exist among the random variables. Therefore, these variables must be free a priori to take on any such relations. They must not a priori be known to obey some one or more universal, fixed relations (relations that hold independent of the particular scenario.) If such relations are known to exist, the coordinates are constrained, i.e. not free to fluctuate during the derivation. Then there is no knowledge acquisition problem, since then the probability law is known a priori (as being infinitely sharp about these values).
Two examples from [1a,b] clarify these points. First, consider deriving by EPI a probability law p(E,P) for the momentum P and energy E of a free-field particle of mass m. The observer is presumed to know that there is some universal, fixed relation connecting coordinates P and E (see discussion above on holonomic constraints). In fact one of the aims of EPI is to find this relation. Therefore, coordinates P and E are not free to take on all values, so that, by the preceding paragraph, EPI cannot be applied to finding p(E,P). By comparison, consider the spatial coordinates x,y,z,t of the particle. These are known to not obey any a priori fixed relation. Therefore they are free to fluctuate, and consequently p(x,y,z,t) is amenable to derivation by EPI. By serendipity, deriving this law leads to knowledge of the fixed relation connecting the very coordinates P and E that were presumed to exist in the preceding case. It is the famous mass-energy formula of Einstein, E2 = c2P2+m2c4.
How does EPI differ from ontological theories?
EPI is basically a principle of acquired knowledge. The acquired information I is always of one form, Eq. (4) or one of its equivalent forms. Hence, EPI neither provides nor derives a detailed ontological model of reality. Instead, certain primitive object properties are assumed to exist (mass, charge, etc., as previously listed.) and to obey a known invariance principle. In this way EPI differs from theories that require a detailed ontology, such as (i) the Cooper pair model of Higgs mass theory, or (ii) string theory. String theory, e.g., assumes a detailed model for all particles — that they are composed of strings. These can be open or closed, and of various shapes. EPI makes no such stipulations about particles, or indeed about reality, other than the preceding. The sole aims of EPI are to learn the correct probability law and the dynamics of that assumed and observed reality.
Dimensionality is another interesting issue. In string theory, the strings exist in a space with a definite number of dimensions – thought to be 10 by many. By comparison, EPI leaves the dimensionality (called M) of every problem open. The answer is instead found relative to a stipulated number of dimensions. This is again because EPI is simply a learning process based upon observation, and any observation is limited by the dimensionality of its viewing space. The goal of EPI is to provide the correct answer for that arbitrarily dimensioned viewing space. Indeed, uses of EPI have shown that its answers are correct under projection into that space (see below). In this regard it fits into the ideas of Plato and other philosophers on the question of whether there is an ultimate reality with an ultimate dimensionality (see below). In principle ultimate dimensionality is unknowable. However, if we are to be effective EPI observers there must in fact be a definite dimensionality: 3 space dimensions plus 1 time dimension (see prediction (19) below).
Finally, does EPI have anything to say about the possibility of string structure in particular? The scope of EPI is limited to observable data. EPI is also complete, in predicting its own limitations. As examples, EPI gives rise to the Heisenberg uncertainty principle eq. (7), and also to the Compton wavelength limit h/mc of spatial resolution for a particle of mass m [1a,b]. These indicate that EPI cannot predict structures with finer structure than size h/mc. However, strings are supposed to be much smaller than this. Accordingly, EPI can make no statement about the possibility of string structure. This exercise of caution seems to be justified by the current status of string theory. This theory is not unique but, in fact, consists of a multitude – possibly an infinite number – of candidate string theories! And if there are an infinity of them, is this not equivalent to having no theory at all? Such fatal ambiguity is bound to arise when a theoretical model cannot be verified by observation.
Why does EPI work?
It is perhaps this
lack of an ontological model
that makes the correct predictions of EPI seem “magical” to some, “impossible” to others (see various internet chat rooms). Why does it work? Basically, it works for three reasons:
First and foremost, it is a correct theory of measurement. It provides a philosophically sound, and physically correct, description of what happens during an observation, including the central and active roles played by the perturbing probe particle and the observer.
Second, it does require that the user know something about the observed effect – in the form of a statement of invariance. However, the invariance is not by itself sufficient for deriving the sought-after effect. See [1b] for details.
Finally, the information that describes the quality of the observation is, by a requirement of invariance of information to reference frame, the Fisher information for a four-vector measurement. This information has two convenient mathematical properties for purposes of deriving physical laws:
• the correct form for giving rise to d’Alembertian wave equations when it is varied; and
• obeys the Cramer-Rao inequality which, when applied to various scenarios, gives rise to a Heisenberg-type uncertainty principle for each. Hence the derivations of wave equations and uncertainty principles in [1a,b].
The information is unique. The trial use of any other “information” (including entropy) in its place would simply be wrong for describing the parameter-measurement problem. (Refs. [1a-c] discuss such trial alternative uses of entropy.) It also gives wrong results, except in the special case of classical statistical mechanics (where entropy also works).
Information efficiency constant κ
There is a second part of the principle, and this describes how I relates to J. Only under ideal conditions of knowledge acquisition is relation (10) true. More generally, the measuring system is passive, so that the acquired information level I cannot exceed that at the source, J. This intrinsic information level J therefore acts as an upper bound to the amount I that actually occurs in the data value,
I = κJ, 0 ≤ κ ≤1. (16)
This is the 2nd part of the EPI theory, of which eq. (14) is the first. It states that coefficient κ measures the efficiency by which knowledge J can be acquired about the given system. For example, if the measuring scale in use is too coarse to see quantum effects obviously some information is missing. In confirmation, coefficient κ then turns out to be less than 1 and the theory turns out to be classical. Notice that this observer, by his choice of coarse graining, is thereby participating in forming his local physics (see also Wheeler’s ideas below). Specifically, they obey classical, rather than quantum, mechanics. Classical mechanics has, of course, a benefit of simplicity. However, because κ is less than 1, information is lost, and therefore the output theory turns out to be approximate. In general the proper value of κ is set by the application and cannot be discussed further here. See refs. [1a-c] for details.
Notice that the I-theorem (5) when applied to the basic flow (2) of information J → I requires that I be less than or equal to J. Hence, κ obeys the restricted range (16) on this basis as well.
The special case where I = J (or κ = 1) has important consequences of its own. It gives relativistic quantum phenomena, including the possibility of entangled realities, as described next.
(m) The output PDF is exactly known, but only within the space defined by the measurement, e.g., four-space in the case of a space-time measurement. If the measurement should instead really be (say) 10-dimensional then a use of EPI based upon only 4 dimensions could not of course give the 10-dimensional answer for the PDF. However,
EPI is robust to projection.
The 4-dimensional one that is so obtained will simply be the projection into 4-space of the “correct” 10-space one. For example, the 3-dimensional Schroedinger wave equation is the EPI output of a 3-dimensional analysis, whereas the 4-dimensional Klein-Gordon equation is the output of a 4-dimensional analysis. By the projection property it is also the projection into 4 dimensions of some (perhaps) higher-dimensioned PDF.
Likewise, if the observations are at a coarse grained spacing the solution will be the appropriate coarse grained solution; etc. for fine graining.
(n) The space of J – the physical effect (e.g., the spin of the unseen particle in the EPR-Bohm experiment) – is equivalent to the space of I – the observed effect (say, the spin of the data particle in that experiment). That is,
if information I = J the output PDF obeys a scenario of entangled realities.