What does the term entropy mean from the point of view of information theory? Entropy as a measure of information

Information and entropy

When discussing the concept of information, it is impossible not to touch upon another related concept - entropy. For the first time, the concepts of entropy and information were linked by K. Shannon.

Claude Elwood Shannon ( Claude elwood shannon), 1916-2001 - a distant relative of Thomas Edison, an American engineer and mathematician, was an employee of Bell Laboratories from 1941 to 1972. In his work "Mathematical Communication Theory" (http://cm.bell-labs.com/cm/ms / what / shannonday /), published in 1948, for the first time defined the measure of the information content of any message and the concept of a quantum of information - a bit. These ideas formed the basis of the theory of modern digital communication. Shannon's other work, Communication Theory of Secrecy Systems, published in 1949, helped transform cryptography into a scientific discipline. He is the founder information theory, which has found application in modern high-tech communication systems. Shannon made a huge contribution to the theory of probabilistic schemes, the theory of automata and the theory of control systems - sciences, united by the concept of "cybernetics".

Physical definition of entropy

For the first time the concept of entropy was introduced by Clausius in 1865 as a function of the thermodynamic state of the system

where Q is heat, T is temperature.

The physical meaning of entropy appears as a part of the internal energy of the system, which cannot be turned into work. Clausius got this function empirically by experimenting with gases.

L. Boltzmann (1872) by methods of statistical physics derived the theoretical expression of entropy

where K is a constant; W is the thermodynamic probability (the number of permutations of ideal gas molecules that does not affect the macrostate of the system).

The Boltzmann entropy is derived for an ideal gas and is interpreted as a measure of disorder, a measure of the chaos of a system. For an ideal gas, the entropies of Boltzmann and Clausius are identical. Boltzmann's formula became so famous that it was inscribed as an epitaph on his grave. It is believed that entropy and chaos are one and the same. Despite the fact that entropy describes only ideal gases, it began to be used uncritically to describe more complex objects.

Boltzmann himself in 1886. tried to explain with the help of entropy what life is. According to Boltzmann, life is a phenomenon that can reduce its entropy. According to Boltzmann and his followers, all processes in the Universe change in the direction of chaos. The universe is heading towards thermal death. This gloomy forecast has long dominated science. However, the deepening of knowledge about the world around us gradually shattered this dogma.

The classics did not associate entropy with information.

Entropy as a measure of information

Note that the concept of "information" is often interpreted as "information", and the transfer of information is carried out using communication. K. Shannon considered entropy as a measure of useful information in the processes of signal transmission over wires.

To calculate the entropy, Shannon proposed an equation that resembles the classical expression for entropy found by Boltzmann. An independent random event is considered x with N possible states and p i is the probability of the i-th state. Then the entropy of the event x

This quantity is also called average entropy. For example, we can talk about the transmission of a message in a natural language. When transmitting different letters, we transmit a different amount of information. The amount of information per letter is related to the frequency of the use of this letter in all messages generated in the language. The more rare a letter we transmit, the more information it contains.

The magnitude

H i = P i log 2 1 / P i = ‑P i log 2 P i,

is called the private entropy, which characterizes only the i-e state.

Let us explain with examples... When a coin is thrown, heads or tails fall out, this is certain information about the results of the throw.

For a coin, the number of equiprobable possibilities is N = 2. The probability of getting heads (tails) is 1/2.

When throwing the dice, we get information about the falling out of a certain number of points (for example, three). When do we get more information?

For a dice, the number of equally probable possibilities is N = 6. The probability of getting three dice points is 1/6. Entropy is 2.58. The realization of the less probable event provides more information. The greater the uncertainty before receiving a message about an event (throwing a coin, dice), the more information is received when receiving a message.

This approach to the quantitative expression of information is far from universal, since the adopted units do not take into account such important properties of information as its value and meaning. Abstracting from the specific properties of information (meaning, its value) about real objects, as it turned out later, made it possible to identify general patterns of information. The units (bits) proposed by Shannon for measuring the amount of information are suitable for evaluating any messages (the birth of a son, the results of a sports match, etc.). In the future, attempts were made to find such measures of the amount of information that would take into account its value and meaning. However, universality was immediately lost: for different processes, the criteria of value and meaning are different. In addition, the definitions of the meaning and value of information are subjective, and the measure of information proposed by Shannon is objective. For example, a smell carries a huge amount of information for an animal, but is elusive for a person. The human ear does not perceive ultrasonic signals, but they carry a lot of information for the dolphin, etc. Therefore, the measure of information proposed by Shannon is suitable for studying all types of information processes, regardless of the "tastes" of the information consumer.

Measurement information

From the physics course, you know that before measuring the value of any physical quantity, you must enter the unit of measurement. Information also has such a unit - a bit, but its meaning is different with different approaches to the definition of the concept of "information".

There are several different approaches to the problem of measuring information.

Claude Elwood Shannon (1916-2001) -
American engineer and mathematician,
founder of information theory,
those. processing theory, transmission
and storage of information

Claude Shannon first began to interpret transmitted messages and noises in communication channels from the point of view of statistics, considering both finite and continuous sets of messages. Claude Shannon is called "The father of information theory".

One of the most famous scientific works of Claude Shannon is his article "Mathematical theory of communication" published in 1948.

In this work, Shannon, exploring the problem of rational transmission of information through a noisy communication channel, proposed a probabilistic approach to understanding communications, created the first, truly mathematical, theory of entropy as a measure of randomness and introduced a measure of discrete distribution p probabilities on the set of alternative states of the transmitter and receiver of messages.

Shannon set requirements for measuring entropy and derived a formula that became the basis of quantitative information theory:

H (p).

Here n- the number of characters from which the message (alphabet) can be composed, H - information binary entropy .

In practice, the values of the probabilities p i in the above formula, replace them with statistical estimates: p i - relative frequency i th character in the message, where N- the number of all characters in the message, N i- absolute frequency i th character in the message, i.e. occurrence number i th character in the message.

In the introduction to his article "Mathematical theory of communication" Shannon notes that in this article he expands the theory of communication, the main provisions of which are contained in important works Nyquist and Hartley.

Harry Nyquist (1889-1976) -
American Swedish engineer
origin, one of the pioneers
information theory

Nyquist's first results in determining the width of the frequency range required for the transmission of information laid the foundations for the subsequent success of Claude Shannon in the development of information theory.

In 1928, Hartley introduced the logarithmic measure of information H = K Log 2 N, which is often called Hartley's amount of information.

Hartley owns the following important theorem on the required amount of information: if in a given set M consisting of N elements, contains element x, about which it is only known that it belongs to this set M then to find x, it is necessary to obtain about this set the amount of information equal to log 2 N bit.

By the way, note that the name BIT comes from the English abbreviation BIT - BInary digiT... This term was first proposed by an American mathematician John Tukey in 1946. Hartley and Shannon used the bit as a unit of measure for information.

In general, Shannon's entropy is the entropy of a set of probabilities p 1 , p 2 ,…, p n.

Ralph Winton Lyon Hartley (1888-1970)
- American electronics scientist

Strictly speaking, if X p 1 , p 2 ,…, p n- the probabilities of all its possible values, then the function H (X)sets the entropy of this random variable, while, although X and is not an argument of entropy, you can write H (X).

Similarly, if Y is a finite discrete random variable, and q 1 , q 2 ,…, q m are the probabilities of all its possible values, then for this random variable it is possible to write H (Y).

John Wilder Tukey (1915-2000) -
American mathematician. Tukey chose
bit to denote one bit
in binary notation

Shannon named the function H(X)entropy on advice John von Neumann.

Neumann argued: this function should be called entropy “For two reasons. First of all, your uncertainty function was used in statistical mechanics under this name, so it already has a name. In second place, and more importantly, no one knows what entropy really is, so you will always have an advantage in the discussion. ".

We must assume that this advice of Neumann was not a simple joke. Most likely, both John von Neumann and Claude Shannon knew about the information interpretation of the Boltzmann entropy as a quantity characterizing the incompleteness of information about the system.

In Shannon's definition entropy is the amount of information per one elementary message of the source generating statistically independent messages.

7. Entropy of Kolmogorov

Andrey Nikolaevich
Kolmogorov (1903-1987) -
Soviet scientist, one of the largest
20th century mathematicians

A.N. Kolmogorov fundamental results were obtained in many areas of mathematics, including the theory of complexity of algorithms and the theory of information.

In particular, he plays a key role in transforming information theory, formulated by Claude Shannon as a technical discipline, into a rigorous mathematical science, and in building information theory on a fundamentally different basis from Shannon's.

In his works on information theory and in the field of the theory of dynamical systems, A.N. Kolmogorov generalized the concept of entropy to ergodic random processes through the limiting probability distribution. To understand the meaning of this generalization, it is necessary to know the basic definitions and concepts of the theory of random processes.

The value of the Kolmogorov entropy (also called K-entropy) specifies an estimate of the rate of loss of information and can be interpreted as a measure of the "memory" of the system, or a measure of the rate of "forgetting" of the initial conditions. It can also be viewed as a measure of the randomness of the system.

8. Renyi's entropy

Alfred Renyi (1921-1970) -
Hungarian mathematician, creator
Institute of Mathematics in Budapest,
now bearing his name

Introduced a one-parameter spectrum of Renyi entropies.

On the one hand, the Renyi entropy is a generalization of the Shannon entropy. On the other hand, at the same time, it is a generalization of the distance (discrepancy) Kullback-Leibler... Note also that it is Renyi who is responsible for the complete proof of Hartley's theorem on the required amount of information.

Kullback-Leibler distance(information divergence, relative entropy) is an asymmetric measure of the distance from each other of two probability distributions.

Typically, one of the compared distributions is the "true" distribution, and the second distribution is the inferred (tested) distribution, which is an approximation of the first.

Let be X, Y are finite discrete random variables for which the range of possible values belongs to a given set and the probability functions are known: P (X = a i) = p i and P (Y = a i) = q i.

Then the value DKL of the Kullback-Leibler distance is calculated by the formulas

D KL (X, Y) =, D KL (Y, X) = .

In the case of absolutely continuous random variables X, Y given by their distribution densities, in the formulas for calculating the value of the Kullback-Leibler distance, the sums are replaced by the corresponding integrals.

The Kullback-Leibler distance is always a non-negative number, while it is equal to zero D KL(X, Y) = 0 if and only if for the given random variables the equality X = Y.

In 1960, Alfred Renyi proposes his generalization of entropy.

Renyi entropy is a family of functionals for the quantitative variety of the randomness of the system. Renyi defined his entropy as the moment of order α of the measure of an ε-partition (covering).

Let α be a given real number satisfying the requirements α ≥ 0, α ≠ 1. Then the Renyi entropy of order α is defined by the formula H α = H α ( X), where p i = P (X = x i) is the probability of an event that a discrete random variable X turns out to be equal to its corresponding possible value, n- the total number of different possible values of the random variable X.

For even distribution when p 1 = p 2 =…= p n =1/n, all Renyi entropies are equal H α ( X) = ln n.

Otherwise, the values of the Renyi entropies decrease slightly with increasing values of the parameter α. Renyi entropies play an important role in ecology and statistics as indices of diversity.

Renyi entropy is also important in quantum information, it can be used as a measure of complexity.

Consider some special cases of the Renyi entropy for specific values of the order α:

1. Hartley's entropy : H 0 = H 0 (X) = ln n, where n is the power of the range of possible values of a finite random variable X, i.e. the number of distinct elements belonging to the set of possible values;

2. Shannon information entropy : H 1 = H 1 (X) = H 1 (p) (defined as the limit as α → 1, which is easy to find, for example, using the L'Hôpital rule);

3. Correlation entropy or collision of entropy: H 2 = H 2 (X) = - ln ( X = Y);

4. Min-entropy : H ∞ = H ∞ (X).

Note that for any non-negative value of the order (α ≥ 0), the inequalities H ∞ (X) ≤ H α ( X). Besides, H 2 (X) ≤ H 1 (X) and H ∞ (X) ≤ H 2 (X) ≤ 2 H ∞ (X).

Alfred Renyi introduced not only his absolute entropies (1.15), he also defined the spectrum of divergence measures that generalize the Kullback-Leibner divergences.

Let α be a given real number satisfying the requirements α> 0, α ≠ 1. Then, in the notation used to determine the value D KL the Kullback-Leibler distance, the value of the Rényi divergence of order α is determined by the formulas

D α ( X, Y), D α ( X, Y).

The Renyi divergence is also called alpha-divergence or α-divergence. Rényi himself used a logarithm to the base 2, but, as always, the value of the base of the logarithm is completely unimportant.

9. Entropy of Tsallis

Constantino Tsallis (born 1943) -
Brazilian physicist
Greek origin

In 1988, he proposed a new generalization of entropy, which is convenient for application in order to develop the theory of nonlinear thermodynamics.

The generalization of entropy proposed by him may possibly play a significant role in theoretical physics and astrophysics in the near future.

Entropy of Tsallis Sq, often called nonextensive (nonadditive) entropy, is defined for n microstates according to the following formula:

S q = S q (X) = S q (p) = K· , .

Here K- dimensional constant, if dimension plays an important role in understanding the problem.

Tsallis and his supporters propose to develop "nonextensive statistical mechanics and thermodynamics" as a generalization of these classical disciplines to the case of systems with long memory and / or long-range forces.

From all other types of entropy, incl. and from the Renyi entropy, the Tsallis entropy differs in that it is not additive. This is a fundamental and important difference..

Tsallis and his supporters believe that this feature makes it possible to build a new thermodynamics and a new statistical theory, which are ways to simply and correctly describe systems with long memory and systems in which each element interacts not only with its nearest neighbors, but also with the entire system as a whole. or its large parts.

An example of such systems, and therefore a possible object of research using the new theory, are cosmic gravitating systems: star clusters, nebulae, galaxies, galaxy clusters, etc.

Since 1988, when Constantino Tsallis proposed his entropy, a significant number of applications of thermodynamics of anomalous systems (with length memory and / or long-range forces) have appeared, including in the field of thermodynamics of gravitating systems.

10. Quantum entropy of von Neumann

John (Janos) von Neumann (1903-1957) -
American mathematician and physicist
Hungarian descent

Von Neumann entropy plays an important role in quantum physics and astrophysical research.

John von Neumann made a significant contribution to the development of such branches of science as quantum physics, quantum logic, functional analysis, set theory, computer science and economics.

He was a member of the Manhattan Nuclear Weapons Project, one of the founders of mathematical game theory and the concept of cellular automata, and the founder of modern computer architecture.

Von Neumann's entropy, like any entropy, is associated with information: in this case, with information about a quantum system. And in this regard, it plays the role of a fundamental parameter that quantitatively characterizes the state and direction of evolution of a quantum system.

Currently, von Neumann entropy is widely used in various forms (conditional entropy, relative entropy, etc.) within the framework of quantum information theory.

Various entanglement measures are directly related to the von Neumann entropy. Nevertheless, a number of works have recently appeared devoted to criticism of Shannon's entropy as a measure of information and its possible inadequacy, and, therefore, inadequacy of von Neumann's entropy as a generalization of Shannon's entropy.

This review (unfortunately, cursory, and sometimes not sufficiently mathematically rigorous) of the evolution of scientific views on the concept of entropy allows us to answer important questions related to the true essence of entropy and the prospects for using the entropy approach in scientific and practical research. We will confine ourselves to considering the answers to two such questions.

First question: Do the numerous varieties of entropy, both considered and not considered above, have anything in common other than the same name?

This question arises naturally if we take into account the diversity that characterizes the existing different concepts of entropy.

To date, the scientific community has not developed a single, universally recognized answer to this question: some scientists answer this question in the affirmative, others negatively, and still others relate to the common entropies of various types with a noticeable degree of doubt ...

Clausius, apparently, was the first scientist who was convinced of the universal nature of entropy and believed that it plays an important role in all processes occurring in the Universe, in particular, determining their direction of development in time.

By the way, it is Rudolf Clausius who owns one of the formulations of the second law of thermodynamics: "A process is impossible, the only result of which would be the transfer of heat from a colder body to a hotter one.".

This formulation of the second law of thermodynamics is called Clausius' postulate , and the irreversible process referred to in this postulate is Clausius process .

Since the discovery of the second law of thermodynamics, irreversible processes have played a unique role in the physical picture of the world. Thus, the famous article of 1849 William Thompson, in which one of the first formulations of the second law of thermodynamics is given, was called "On the universal tendency in nature to dissipate mechanical energy."

Note also that Clausius was forced to use the cosmological language: "The entropy of the universe tends to the maximum".

Ilya Romanovich Prigozhin (1917-2003) -
Belgian-American physicist and
chemist of Russian origin,
Nobel Prize Laureate
in chemistry 1977

Came to similar conclusions Ilya Prigogine... Prigogine believes that the principle of entropy is responsible for the irreversibility of time in the Universe and, possibly, plays an important role in understanding the meaning of time as a physical phenomenon.

To date, many studies and generalizations of entropy have been carried out, including from the point of view of a rigorous mathematical theory. However, the noticeable activity of mathematicians in this area has not yet been in demand in applications, with the exception, perhaps, of works Kolmogorov, Renyi and Tsallisa.

Undoubtedly, entropy is always a measure (degree) of chaos and disorder. It is the variety of manifestations of the phenomenon of chaos and disorder that determines the inevitability of a variety of entropy modifications.

Second question: Is it possible to recognize the scope of application of the entropy approach as extensive, or are all applications of entropy and the second law of thermodynamics limited to thermodynamics itself and related areas of physical science?

The history of the scientific study of entropy indicates that entropy is a scientific phenomenon discovered in thermodynamics, and then successfully migrated to other sciences and, above all, to information theory.

Undoubtedly, entropy plays an important role in almost all areas of modern natural science: in thermal physics, in statistical physics, in physical and chemical kinetics, in biophysics, astrophysics, cosmology, and information theory.

When speaking about applied mathematics, one cannot fail to mention the applications of the principle of maximum entropy.

As already noted, quantum mechanical and relativistic objects are important fields of application of entropy. In quantum physics and astrophysics, such applications of entropy are of great interest.

We only mention one original result of black hole thermodynamics: the entropy of a black hole is equal to a quarter of its surface area (the area of the event horizon).

In cosmology, it is believed that the entropy of the Universe is equal to the number of relic radiation quanta per nucleon.

Thus, the scope of the entropy approach is very extensive and includes a wide variety of branches of knowledge, from thermodynamics, other areas of physical science, computer science and ending, for example, history and economics.

A.V. Segal, Doctor of Economics, Crimean University named after V.I. Vernadsky

1. Introduction.

2. What did Claude Shannon measure?

3. Limits of evolutionary variability of information systems.

4. Limited adaptation of biological species.

5. Stages of development of the theory of entropy.

6. Methods for calculating the amount of structural information and informational entropy of texts.

7. Information-entropic relationships of adaptation and development processes.

8. Information and energy.

9. Conclusion.

10. Bibliography.

INTRODUCTION

In the second half of the 20th century, two events took place, which, in our opinion, largely determine the further paths of scientific comprehension of the world. We are talking about the creation of information theory and the beginning of research into the mechanisms of anti-entropy processes, for the study of which synergetics draws on all the latest achievements of nonequilibrium thermodynamics, information theory and general systems theory.

The fundamental difference between this stage of the development of science and the previous stages is that before the creation of the listed areas of research, science was able to explain only the mechanisms of processes leading to an increase in chaos and an increase in entropy. As for the biological and evolutionary concepts developed since the time of Lamarck and Darwin, they still do not have strict scientific substantiations and contradict the Second Law of Thermodynamics, according to which the increase in entropy accompanying all processes in the world is an indispensable physical law.

The merit of nonequilibrium thermodynamics lies in the fact that it was able to reveal the mechanisms of anti-entropy processes that do not contradict the Second Law of Thermodynamics, since a local decrease in entropy inside a self-organizing system is always paid for by a large increase in the absolute value of the entropy of the external environment.

The most important step towards understanding the nature and mechanisms of anti-entropic processes is the introduction of a quantitative measure of information. Initially, this measure was intended only for the solution of purely applied problems of communication technology. However, subsequent research in the field of physics and biology made it possible to identify the universal measures proposed by K. Shannon, allowing to establish the relationship between the amount of information and physical entropy and ultimately determine the essence of a new scientific interpretation of the concept of "information" as a measure of the structural ordering of the most diverse in nature systems ...

Using a metaphor, we can say that before the introduction of a single informational quantitative measure into science, the world represented in natural science concepts seemed to “lean on two whales”: energy and matter. The "third whale" is now information that participates in all processes occurring in the world, from microparticles, atoms and molecules to the functioning of the most complex biological and social systems.

Naturally, the question arises: do the latest data of modern science confirm or deny the evolutionary paradigm of the origin of life and biological species?

To answer this question, it is necessary, first of all, to understand which properties and aspects of the multifaceted concept "information" reflects the quantitative measure that K. Shannon introduced into science.

Using the measure of the amount of information allows you to analyze the general mechanisms of information-entropic interactions that underlie all processes of information accumulation spontaneously occurring in the surrounding world, which lead to the self-organization of the structure of systems.

At the same time, information-entropy analysis also makes it possible to identify gaps in evolutionary concepts, which are nothing more than untenable attempts to reduce the problem of the origin of life and biological species to simple mechanisms of self-organization, without taking into account the fact that systems of this level of complexity can be created only on the basis of that information. , which was originally laid down in the plan preceding their creation.

The studies of the properties of information systems carried out by modern science give every reason to assert that all systems can be formed only according to the rules descended from the upper hierarchical levels, and these rules themselves existed before the systems themselves in the form of an initial plan (idea of creation).

WHAT HAS CLAUDE SHENNON MEASURED?

The information theory is based on the method of calculating the amount of new (unpredictable) and redundant (predictable) information contained in messages transmitted through technical communication channels, proposed by C. Shannon.

The method for measuring the amount of information proposed by Shannon turned out to be so universal that its application is no longer limited to the narrow framework of purely technical applications.

Contrary to the opinion of Shannon himself, who warned scientists against the hasty spread of the method proposed by him beyond the applied problems of communication technology, this method began to find an ever wider application in studies of physical, biological, and social systems.

The key to a new understanding of the essence of the phenomenon of information and the mechanism of information processes was the relationship between information and physical entropy established by L. Brillouin. This relationship was originally laid in the very foundation of information theory, since Shannon proposed to use the probable entropy function borrowed from statistical thermodynamics to calculate the amount of information.

Many scholars (starting with Shannon himself) were inclined to view such borrowing as a purely formal device. L. Brillouin showed that between the amount of information and physical entropy calculated according to Shannon, there is not a formal, but a meaningful connection.

In statistical physics, the probabilistic function of entropy is used to study the processes leading to thermodynamic equilibrium, in which all states of molecules (their energies, velocities) approach equiprobable, and the entropy tends to a maximum value.

Thanks to information theory, it became obvious that using the same function it is possible to investigate such systems that are far from the state of maximum entropy, such as, for example, a written text.

Another important finding is that

With the help of the probabilistic function of entropy, it is possible to analyze all stages of the system's transition from a state of complete chaos, which corresponds to equal values of probabilities and the maximum value of entropy, to a state of limiting ordering (rigid determination), which corresponds to the only possible state of its elements.

This conclusion turns out to be equally valid for such systems dissimilar in nature as gases, crystals, written texts, biological organisms or communities, etc.

Moreover, if for a gas or crystal when calculating the entropy, only the microstate (i.e., the state of atoms and molecules) and the macrostate of these systems (i.e., gas or crystal as a whole) are compared, then for systems of a different nature (biological, intellectual, social), entropy can be calculated at one or another arbitrarily chosen level. In this case, the calculated value of the entropy of the system under consideration and the amount of information characterizing the degree of ordering of this system and equal to the difference between the maximum and real value of entropy will depend on the probability distribution of the states of the elements of the lower level, i.e. those elements that in their totality form these systems.

In other words,

the amount of information stored in the structure of the system is proportional to the degree of deviation of the system from the state of equilibrium due to the order preserved in the structure of the system.

Unbeknownst to him, Shannon armed science with a universal measure, suitable in principle (provided that the values of all probabilities are revealed) to assess the degree of orderliness of all systems existing in the world.

Defining the information measure introduced by Chenon as measure of orderliness of movement, it is possible to establish the relationship between information and energy, considering energy as a measure of traffic intensity... Moreover, the amount of information stored in the structure of systems is proportional to the total energy of the internal connections of these systems.

Simultaneously with the identification of the general properties of information as a phenomenon, fundamental differences related to different levels of complexity of information systems are also revealed.

So, for example, all physical objects, unlike biological ones, do not possess special organs of memory, recoding of signals coming from the outside world, or information communication channels. The information stored in them is, as it were, "smeared" throughout their structure. At the same time, if crystals were not capable of storing information in the internal bonds that determine their ordering, there would be no possibility to create artificial memory and technical devices designed for information processing based on crystal structures.

At the same time, it should be borne in mind that the creation of such devices became possible only thanks to the human mind, which was able to use the elementary information properties of crystals to build complex information systems.

The simplest biological system surpasses in its complexity the most perfect information system created by man. Already at the level of the simplest unicellular organisms, the most complex informational genetic mechanism necessary for their reproduction is involved. In multicellular organisms, in addition to the information system of heredity, there are specialized organs for storing information and processing it (for example, systems that transcode visual and auditory signals coming from the outside world before sending them to the brain, systems for processing these signals in the brain). The most complex network of information communications (nervous system) permeates and transforms into a whole the entire multicellular organism.

Concept Entropy first introduced in 1865 by R. Clausius in thermodynamics to determine the measure of irreversible energy dissipation. Entropy is used in various branches of science, including information theory, as a measure of the uncertainty of an experiment, a test, which can have different outcomes. These definitions of entropy are deeply connected. So, on the basis of ideas about information, all the most important provisions of statistical physics can be derived. [BES. Physics. M: Great Russian Encyclopedia, 1998].

Information binary entropy for independent (non-equiprobable) random events x with n possible states (from 1 to n, p is the probability function) is calculated by Shannon's formula:

This quantity is also called average entropy messages. The entropy in Shannon's formula is the average characteristic - the mathematical expectation of the distribution of a random variable.
For example, in a sequence of letters that make up any sentence in Russian, different letters appear with different frequencies, so the uncertainty of appearance for some letters is less than for others.
In 1948, exploring the problem of the rational transmission of information through a noisy communication channel, Claude Shannon proposed a revolutionary probabilistic approach to understanding communication and created the first truly mathematical theory of entropy. His sensational ideas quickly became the basis for the development of information theory that uses the concept of probability. The concept of entropy as a measure of randomness was introduced by Shannon in his article "A Mathematical Theory of Communication", published in two parts in the Bell System Technical Journal in 1948.

In the case of equiprobable events (special case), when all options are equally probable, only the dependence on the number of options under consideration remains, and Shannon's formula is greatly simplified and coincides with the Hartley formula, which was first proposed by an American engineer Ralph Hartley in 1928, as one of the scientific approaches to the assessment of messages:

, where I is the amount of transmitted information, p is the probability of an event, N is the possible number of different (equally probable) messages.

Task 1. On equally probable events.
There are 36 cards in the deck. How much information is contained in the message that a card with the portrait “ace” is taken from the deck; Ace of spades?

Probability p1 = 4/36 = 1/9 and p2 = 1/36. Using Hartley's formula we have:

Answer: 3.17; 5.17 bit
Note (from the second result) that 6 bits are needed to encode all cards.
It is also clear from the results that the less likely an event is, the more information it contains. (This property is called monotony)

Task 2. On uneven events
There are 36 cards in the deck. Of these, 12 cards with "portraits". Alternately, one of the cards is taken from the deck and shown to determine whether a portrait is depicted on it. The card is returned to the deck. Determine the amount of information transmitted each time one card is shown.

Entropy (information theory)

Entropy (informational)- a measure of the randomness of information, the uncertainty of the appearance of any symbol of the primary alphabet. In the absence of information losses, it is numerically equal to the amount of information per symbol of the transmitted message.

For example, in a sequence of letters that make up any sentence in Russian, different letters appear with different frequencies, so the uncertainty of appearance for some letters is less than for others. If we take into account that some combinations of letters (in this case they speak of entropy n-th order, see) are very rare, then the uncertainty is further reduced.

To illustrate the concept of information entropy, you can also use the example from the field of thermodynamic entropy, called Maxwell's demon. The concepts of information and entropy have deep connections with each other, but despite this, the development of theories in statistical mechanics and information theory took many years to make them consistent with each other.

Formal definitions

Defining with your own information

It is also possible to determine the entropy of a random variable by first introducing the concept of the distribution of a random variable X with a finite number of values:

I(X) = - log P X (X).

Then the entropy will be defined as:

The unit of measurement of information and entropy depends on the base of the logarithm: bit, nat or hartley.

Information entropy for independent random events x with n possible states (from 1 to n) is calculated by the formula:

This quantity is also called average entropy of the message... The quantity is called private entropy characterizing only i-e state.

Thus, the entropy of the event x is the sum with the opposite sign of all products of the relative frequencies of occurrence of the event i multiplied by their binary logarithms (base 2 was chosen only for the convenience of working with information presented in binary form). This definition for discrete random events can be extended to the probability distribution function.

In general b-ary entropy(where b equals 2, 3, ...) a source with the original alphabet and a discrete probability distribution where p i is a probability a i (p i = p(a i) ) is determined by the formula:

The definition of Shannon's entropy is related to the concept of thermodynamic entropy. Boltzmann and Gibbs did a great deal of work in statistical thermodynamics that contributed to the adoption of the word "entropy" in information theory. There is a connection between thermodynamic and informational entropy. For example, Maxwell's demon also opposes the thermodynamic entropy of information, and the receipt of any amount of information is equal to the lost entropy.

Alternative definition

Another way to define the entropy function H is the proof that H is uniquely defined (as previously stated) if and only if H satisfies the conditions:

Properties

It is important to remember that entropy is a quantity defined in the context of a probabilistic model for a data source. For example, a coin toss has entropy - 2 (0.5log 2 0.5) = 1 bit per one toss (provided it is independent). A source that generates a string consisting only of the letters "A" has zero entropy: ... So, for example, empirically it can be established that the entropy of an English text is 1.5 bits per character, which of course will vary for different texts. The degree of entropy of a data source means the average number of bits per data element required to encrypt it without losing information, with optimal encoding.

Some data bits may not carry information. For example, data structures often store redundant information, or have identical sections regardless of the information in the data structure.
The amount of entropy is not always expressed as an integer number of bits.

Mathematical properties

Efficiency

The original alphabet encountered in practice has a probability distribution that is far from optimal. If the original alphabet had n symbols, then it can be compared with the "optimized alphabet", the probability distribution of which is uniform. The ratio of the entropy of the original to the optimized alphabet is the efficiency of the original alphabet, which can be expressed as a percentage.

It follows from this that the efficiency of the original alphabet with n symbols can be defined simply as equal to it n-ary entropy.

Entropy limits the maximum possible lossless (or almost lossless) compression that can be achieved using a theoretically - typical set or, in practice - Huffman coding, Lempel-Ziv-Welch coding, or arithmetic coding.

Variations and generalizations

Conditional entropy

If the following characters of the alphabet are not independent (for example, in French, after the letter "q" is almost always followed by "u", and after the word "leader" in Soviet newspapers usually followed by the word "production" or "labor"), the amount of information that carries the sequence of such symbols (and hence the entropy) is obviously smaller. To account for such facts, conditional entropy is used.

The first-order conditional entropy (similarly for the first-order Markov model) is the entropy for the alphabet, where the probabilities of the appearance of one letter after another (that is, the probabilities of two-letter combinations) are known:

where i is a state dependent on the preceding character, and p i (j) is the probability j, provided that i was the previous character.

So, for the Russian language without the letter "".

The partial and general conditional entropies are used to fully describe the information loss during data transmission in a noisy channel. For this, the so-called channel matrices... So, to describe the loss from the source (that is, the sent signal is known), consider the conditional probability of receiving a symbol by the receiver b j provided that the symbol was sent a i... In this case, the channel matrix has the following form:

	b 1	b 2	…	b j	…	b m
a 1			…		…
a 2			…		…
…	…	…	…	…	…	…
a i			…		…
…	…	…	…	…	…	…
a m			…		…

Obviously, the probabilities located on the diagonal describe the probability of correct reception, and the sum of all the elements of the column will give the probability of the appearance of the corresponding symbol on the receiver side - p(b j) ... Loss per transmitted signal a i, are described through the partial conditional entropy:

To calculate the transmission loss for all signals, the general conditional entropy is used:

It means the entropy from the source side, similarly it is considered - the entropy from the receiver side: instead of everywhere it is indicated (summing up the elements of the string, you can get p(a i) , and the elements of the diagonal mean the probability that exactly the symbol that was received was sent, that is, the probability of correct transmission).

Mutual entropy

Mutual entropy, or entropy of union, is intended for calculating the entropy of interconnected systems (entropy of the joint appearance of statistically dependent messages) and is denoted H(AB) , where A, as always, characterizes the transmitter, and B- the receiver.

The relationship between transmitted and received signals is described by the probabilities of joint events p(a i b j) , and only one matrix is required to fully describe the channel characteristics:

p(a 1 b 1)	p(a 1 b 2)	…	p(a 1 b j)	…	p(a 1 b m)
p(a 2 b 1)	p(a 2 b 2)	…	p(a 2 b j)	…	p(a 2 b m)
…	…	…	…	…	…
p(a i b 1)	p(a i b 2)	…	p(a i b j)	…	p(a i b m)
…	…	…	…	…	…
p(a m b 1)	p(a m b 2)	…	p(a m b j)	…	p(a m b m)

For a more general case, when not a channel is described, but simply interacting systems, the matrix need not be square. Obviously, the sum of all elements of the column numbered j will give p(b j) , the sum of the line with the number i there is p(a i) , and the sum of all elements of the matrix is equal to 1. The joint probability p(a i b j) events a i and b j calculated as the product of the original and conditional probabilities,

Conditional probabilities are produced using Bayes' formula. Thus, there is all the data for calculating the entropies of the source and receiver:

Mutual entropy is calculated by sequential row (or column) summation of all matrix probabilities multiplied by their logarithm:

H(AB) = −	∑	∑	p(a i b j) log p(a i b j).
	i	j

The unit of measurement is a bit / two characters, this is due to the fact that mutual entropy describes the uncertainty for a pair of characters - sent and received. By simple transformations we also obtain

Mutual entropy has the property information completeness- from it you can get all the considered values.