How do I compute the approximate entropy of a bit

2020-02-02 06:40发布

Is there a standard way to do this?

Googling -- "approximate entropy" bits -- uncovers multiple academic papers but I'd like to just find a chunk of pseudocode defining the approximate entropy for a given bit string of arbitrary length.

(In case this is easier said than done and it depends on the application, my application involves 16,320 bits of encrypted data (cyphertext). But encrypted as a puzzle and not meant to be impossible to crack. I thought I'd first check the entropy but couldn't easily find a good definition of such. So it seemed like a question that ought to be on StackOverflow! Ideas for where to begin with de-cyphering 16k random-seeming bits are also welcome...)

See also this related question:
What is the computer science definition of entropy?

8条回答
我欲成王,谁敢阻挡
2楼-- · 2020-02-02 06:45

The NIST Random Number Generator evaluation toolkit has a way of calculating "Approximate Entropy." Here's the short description:

Approximate Entropy Test Description: The focus of this test is the frequency of each and every overlapping m-bit pattern. The purpose of the test is to compare the frequency of overlapping blocks of two consecutive/adjacent lengths (m and m+1) against the expected result for a random sequence.

And a more thorough explanation is available from the PDF on this page:

http://csrc.nist.gov/groups/ST/toolkit/rng/documentation_software.html

查看更多
趁早两清
3楼-- · 2020-02-02 06:45

Using Shannon entropy of a word with this formula : http://imgur.com/a/DpcIH

Here's a O(n) algorithm that calculates it :

import math
from collections import Counter


def entropy(s):
    l = float(len(s))
    return -sum(map(lambda a: (a/l)*math.log2(a/l), Counter(s).values()))
查看更多
老娘就宠你
4楼-- · 2020-02-02 06:48

Shannon's entropy equation is the standard method of calculation. Here is a simple implementation in Python, shamelessly copied from the Revelation codebase, and thus GPL licensed:

import math


def entropy(string):
        "Calculates the Shannon entropy of a string"

        # get probability of chars in string
        prob = [ float(string.count(c)) / len(string) for c in dict.fromkeys(list(string)) ]

        # calculate the entropy
        entropy = - sum([ p * math.log(p) / math.log(2.0) for p in prob ])

        return entropy


def entropy_ideal(length):
        "Calculates the ideal Shannon entropy of a string with given length"

        prob = 1.0 / length

        return -1.0 * length * prob * math.log(prob) / math.log(2.0)

Note that this implementation assumes that your input bit-stream is best represented as bytes. This may or may not be the case for your problem domain. What you really want is your bitstream converted into a string of numbers. Just how you decide on what those numbers are is domain specific. If your numbers really are just one and zeros, then convert your bitstream into an array of ones and zeros. The conversion method you choose will affect the results you get, however.

查看更多
时光不老,我们不散
5楼-- · 2020-02-02 06:54

Sorry to take so long answering this question.

Take a look at my recent paper:

"BiEntropy - The approximate entropy of a finite binary string"

http://arxiv.org/abs/1305.0954

"We design, implement and test a simple algorithm which computes the approximate entropy of a finite binary string of arbitrary length. The algorithm uses a weighted average of the Shannon Entropies of the string and all but the last binary derivative of the string. We successfully test the algorithm in the fields of Prime Number Theory (where we prove explicitly that the sequence of prime numbers is not periodic), Human Vision, Cryptography, Random Number Generation and Quantitative Finance"

查看更多
▲ chillily
6楼-- · 2020-02-02 06:59

There is no single answer. Entropy is always relative to some model. When someone talks about a password having limited entropy, they mean "relative to the ability of an intelligent attacker to predict", and it's always an upper bound.

Your problem is, you're trying to measure entropy in order to help you find a model, and that's impossible; what an entropy measurement can tell you is how good a model is.

Having said that, there are some fairly generic models that you can try; they're called compression algorithms. If gzip can compress your data well, you have found at least one model that can predict it well. And gzip is, for example, mostly insensitive to simple substitution. It can handle "wkh" frequently in the text as easily as it can handle "the".

查看更多
趁早两清
7楼-- · 2020-02-02 06:59

Here's an implementation in Python (I also added it to the Wiki page):

import numpy as np

def ApEn(U, m, r):

    def _maxdist(x_i, x_j):
        return max([abs(ua - va) for ua, va in zip(x_i, x_j)])

    def _phi(m):
        x = [[U[j] for j in range(i, i + m - 1 + 1)] for i in range(N - m + 1)]
        C = [len([1 for x_j in x if _maxdist(x_i, x_j) <= r]) / (N - m + 1.0) for x_i in x]
        return -(N - m + 1.0)**(-1) * sum(np.log(C))

    N = len(U)

    return _phi(m) - _phi(m + 1)

Example:

>>> U = np.array([85, 80, 89] * 17)
>>> ApEn(U, 2, 3)
-1.0996541105257052e-05

The above example is consistent with the example given on Wikipedia.

查看更多
登录 后发表回答