-->

Reverse engineer a binary dictionary file to extra

2019-08-18 05:35发布

问题:

I have a ~600MB .DAT file that contains an italian dictionary (accented words with their definitions).

I would like to extract all the strings from this file (a raw dump containing strings and dirty headers/binary data would be all right as long as I can read the words and definitions).

So my question is: Is there a software that could do this in an automated way?

I would tell it: 'I know that this file contains the strings "TREE", "DOG", "CAT", "COLLISION"... now use some brute force, statistical analysis or whatever method to try and find how these strings are encoded'

2 things I'd like to mention:

  • I am software developer but have absolutely no experience or knowledge in reverse engineering, hex editing etc...
  • I do not want to spend hours reading reverse engineering tutorials and doing trial and error using many sofwares. If I don't succeed in extracting what I need in a simple manner, I'll just abandon this task.

I realize that it's probable (if the text is encrypted for instance) that this task could not be performed simply, I just want to give it a try with the best tool available.

回答1:

It seems that such an automated tool does not exist, of if it did, it would only work for a very small set of input files.

I finally found a solution to my problem.

I have an EXE program that allows browsing the dictionary and displaying the definition of a word.

Using AutoHotkey, I wrote a relatively simple script that searches the definition of every word from a 400k words input list, copies it to the clipboard, then pastes it in another output text file.

I had to insert some Sleep statements between the keystrokes, window switching etc. to make the script stable. Estimated time to "parse" the whole dictionary: 20 days :)