What is the computer science definition of entropy

2019-03-07 21:51发布

I've recently started a course on data compression at my university. However, I find the use of the term "entropy" as it applies to computer science rather ambiguous. As far as I can tell, it roughly translates to the "randomness" of a system or structure.

What is the proper definition of computer science "entropy"?

16条回答
Deceive 欺骗
2楼-- · 2019-03-07 22:34

alt text
(source: mit.edu)

from University of Mexico

The information theoretic notion of Entropy is a generalization of the physical notion. There are many ways to describe Entropy. It is a measure of the randomness of a random variable. It is also a measure of the amount of information a random variable or stochastic process contains. It is also a lower bound on the amount a message can be compressed. And finally it is the average number of yes/no questions that need to be asked about an random entity to determine its value.

Equation for Entropy in a sample application for probability calculation:

it is the sum over all values of a rv of the probability of that value times the log of that prob(i.e. p(x)logp(x)). This equation can be derived from first principles of the properties of information.

查看更多
Melony?
4楼-- · 2019-03-07 22:42

Entropy is like a hash code for virus researchers as well. Less entropy you get, it would mean that it is likely encrypted or compressed code which could be potentially be a virus.

A standard binary would have a higher entropy than a compressed or encrypted one.

查看更多
成全新的幸福
5楼-- · 2019-03-07 22:42

My favorite definition, with a more practical focus, is found in Chapter 1 of the excellent book The Pragmatic Programmer: From Journeyman to Master by Andrew Hunt and David Thomas:

Software Entropy

While software development is immune from almost all physical laws, entropy hits us hard. Entropy is a term from physics that refers to the amount of "disorder" in a system. Unfortunately, the laws of thermodynamics guarantee that the entropy in the universe tends toward a maximum. When disorder increases in software, programmers call it "software rot."

There are many factors that can contribute to software rot. The most important one seems to be the psychology, or culture, at work on a project. Even if you are a team of one, your project's psychology can be a very delicate thing. Despite the best laid plans and the best people, a project can still experience ruin and decay during its lifetime. Yet there are other projects that, despite enormous difficulties and constant setbacks, successfully fight nature's tendency toward disorder and manage to come out pretty well.

...

...

A broken window.

One broken window, left unrepaired for any substantial length of time, instills in the inhabitants of the building a sense of abandonment—a sense that the powers that be don't care about the building. So another window gets broken. People start littering. Graffiti appears. Serious structural damage begins. In a relatively short space of time, the building becomes damaged beyond the owner's desire to fix it, and the sense of abandonment becomes reality.

The "Broken Window Theory" has inspired police departments in New York and other major cities to crack down on the small stuff in order to keep out the big stuff. It works: keeping on top of broken windows, graffiti, and other small infractions has reduced the serious crime level.

Tip 4

Don't Live with Broken Windows

Don't leave "broken windows" (bad designs, wrong decisions, or poor code) unrepaired. Fix each one as soon as it is discovered. If there is insufficient time to fix it properly, then board it up. Perhaps you can comment out the offending code, or display a "Not Implemented" message, or substitute dummy data instead. Take some action to prevent further damage and to show that you're on top of the situation.

Text taken from: http://pragprog.com/the-pragmatic-programmer/extracts/software-entropy

查看更多
一夜七次
6楼-- · 2019-03-07 22:44

I always encountered entropy in the sense of Shannon Entropy.

From http://en.wikipedia.org/wiki/Information_entropy:

In information theory, entropy is a measure of the uncertainty associated with a random variable. The term by itself in this context usually refers to the Shannon entropy, which quantifies, in the sense of an expected value, the information contained in a message, usually in units such as bits. Equivalently, the Shannon entropy is a measure of the average information content one is missing when one does not know the value of the random variable.

查看更多
We Are One
7楼-- · 2019-03-07 22:44

In terms of compression and information theory, the entropy of a source is the average amount of information (in bits) that symbols from the source can convey. Informally speaking, the more unlikely a symbol is, the more surprise its appearance brings.

If your source has two symbols, say A and B, and they are equally likely, then each symbol conveys the same amount of information (one bit). A source with four equally likely symbols conveys two bits per symbol.

For a more interesting example, if your source has three symbols, A, B, and C, where the first two are twice as likely as the third, then the third is more surprising but is also less likely. There's a net entropy of 1.52 for this source, as calculated below.

You calculate entropy as the "average surprise", where the "surprise" for each symbol is its probability times the negative binary log of the probability:

                            binary
symbol  weight  probability   log    surprise
  A        2        0.4      -1.32    0.53
  B        2        0.4      -1.32    0.53
  C        1        0.2      -2.32    0.46
total      5        1.0               1.52

The negative of the binary log is used (of course) because logs of values between 0 and 1 (exclusive) are negative.

查看更多
登录 后发表回答