200kB file to search for 8! (40320 permutations) i

2019-08-24 05:03发布

问题:

I am working on disassembling a firmware (Siemens C165 processor - https://www.infineon.com/dgdl/Infineon-C165-DS-v02_00-en%5B8%5D.pdf?fileId=db3a304412b407950112b43a49a66fd7) in IDA.

I have the firmware, so I can also read it via Python.

I need to find a string being permutation of

0, 1, 2, 3, 4, 5, 6, 7 (0-7)

Wrote this simple program:

from itertools import permutations 
l = list(permutations(range(0,8))) 
print(len(l))

with open("firm.ori", 'rb') as f:
    s = f.read()
for i in l:
    hexstr = '\\x'.join('{:02x}'.format(x) for x in i)
    hexstrfinal = "\\0x" + hexstr
    #print hexstrfinal
    if(s.find(b'hexstrfinal')>0):
        print "found"

However, it does not find anything

I thought the sequence will be next to each other, but maybe not.

Want to just be sure that the program is correct.

Well actually, 0-7 should be nibbles, so does it mean I need to search for, for example as a one combination:

0x01, 0x23, 0x45, 0x67 

Above is bytes.

Can somebody confirm this and advise how to search for this?

Update 1:

Tried 2nd variant

from itertools import permutations 
l = list(permutations(range(0,8))) 
print(len(l))

with open("firm.ori", 'rb') as f:
  s = f.read()
for item in l:
  str1 = ''.join(str(e) for e in item)
  n = 2
  out = [str1[i:i+n] for i in range(0, len(str1), n)]

  hexstr = '\\x'.join(e for e in out)
  hexstrfinal  = "\\x" + hexstr 
  #print hexstrfinal
  if(s.find(b'hexstrfinal')>0):
    print "found"

But also no hits ...

Any ideas what I do wrong?

回答1:

There are a few misunderstandings in your code, and a major inefficiency. Let's start with the misunderstandings.

Since firm.ori is opened in binary mode (rb), the result of s = f.read() is a bytes object. Despite having similar methods to a string, this is not a string! It contains bytes, not characters. When you display it, the \x... outputs do not indicate that the bytes object containing ASCII backslashes and xes. Instead, each \x... is an escape sequence used to represent the hex value of a given byte that does not correspond to an ASCII printable character.

Inside your loop, you deal exclusively with strings: hexstr = '\\x'.join('{:02x}'.format(x) for x in i) takes your permutation and formats it to look like the string representation of a bytes object. Hopefully you understand from the previous paragraph why this won't work.

s.find(b'hexstrfinal') searches for the literal ASCII array b'hexstrfinal', not for the variable named hexstrfinal. The latter wouldn't work of course, because hexstrfinal has type str, not bytes. If you were to convert it to bytes using a simple hexstrfinal.encode('ascii'), you'd get b'\\x...', which is not at all what you want. The proper way would be

s.find(hexstrfinal.encode('ascii').decode('unicode-escape').encode('latin1'))

Hopefully you can see why it is unnecessarily inefficient to convert a string three times to get the bytes you want. Any time you start using strings as a crutch to manipulate numbers is a good time to evaluate your approach. That begins our discussion of the inefficiencies in your code.

You are currently attempting to iterate through all the possible permutations of 0-7 rather than looking for the permutations that are actually there. Given that the file is only 200KB in size, it would be unreasonable to expect all or even most permutations to appear in it. Moreover, you are searching the entire file for each possible permutation. For a file size N and K permutations, your code runs in O(N * K) time, while it's possible to do this in a single pass through the file, or O(N). With the appropriate data structures, even a loop written in plain Python will likely run faster than an optimized version of the current code.

The strategy is simple. Iterate through s. If the current character and the following seven make up a valid permutation, start a party. Otherwise, keep looking:

N = 8
allowed = set(range(N))
for index, b in enumerate(s):
    if b in allowed and set(s[index:index + N]) == allowed:
        print(f'Found sequence {s[index:index + N]} at offset {index}')

There are a host of possible optimizations possible here, and you could probably do the whole thing way more efficiently with numpy or scipy.

Things would also get more complicated if you allowed repeats in the sequence. In that case, you'd have to sort the sequences:

allowed = sorted(...)
N = len(allowed)
for index, b in enumerate(s):
    if b in allowed and sorted(s[index:index + N]) == allowed:
        print(f'Found sequence {s[index:index + N]} at offset {index}')

If you are going to search for nibbles, things get more complicated still. I would drop entirely the check b in allowed, and just write a custom check that could be applied at every half-step:

N = 8

def build_set(items):
    output = set()
    for b in items:
        output.add(b & 0xF)
        output.add((b >> 4) & 0xF)
    return output

def matches(seq):
    if len(seq) == N // 2:
        return build_set(seq) == allowed
    elif len(seq) == N // 2 + 1:
        check = build_set(seq[1:-1])
        check.add(seq[0] & 0xF)
        check.add((seq[-1] >> 4) & 0xF)
        return check == allowed
    else:
        return False

allowed = set(range())
for index, b in enumerate(s):
    if matches(s[index:index + N // 2]):
        print(f'Found sequence {s[index:index + N // 2]} at offset {index}.0')
     if matches(s[index:index + N // 2 + 1]):
        print(f'Found sequence {s[index:index + N // 2 + 1]]} at offset {index}.5')

Here, build_set just splits the nibbles into a set. matches checks either an array of 8 nibbles aligned on a byte (4 elements), or an array of 8 nibbles offset by half a byte (5 elements). Both cases are reported independently.



回答2:

It is not clear what you are trying to search for but...

Each permutation of (0,1,2,3,4,5,6,7) will be a seven item tuple similar to this

t = (7, 6, 4, 1, 3, 5, 0, 2)

You can make two-item strings like this

>>> a = [''.join(map(str,thing)) for thing in zip(t,t[1:])]
>>> a
['76', '64', '41', '13', '35', '50', '02']

Then make integers of the strings and feed it to bytes

>>> b = bytes(map(int,a))
>>> b
b'L@)\r#2\x02'

Then search for it

>>> b in s
????

If it doesn't find it it's not there.


Here is a ten character bytes object (similar to your file)

>>> b = b'\xcbl\x7f|_k\x00\x9f\xa2\xcc'

It just happens to be:

>>> bytes([203, 108, 127, 124, 95, 107, 0, 159, 162, 204])

Search for 3-character (or 3-integer) sequences

>>> bytes([127,94,107]) in b
False
>>> bytes([127,95,107]) in b
False
>>> bytes([124,95,107]) in b
True
>>>

When I thing of binary files, I really think integers not characters.