可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have a problem where I have to read the time of recording from the video recorded by a surveillance camera.

The time shows up on the top-left area of the video. Below is a link to screen grab of the area which shows the time. Also, the digit color(white/black) keeps changing during the duration of the video.

http://i55.tinypic.com/2j5gca8.png

Please guide me in the direction to approach this problem. I am a Java programmer so would prefer an approach through Java.

EDIT: Thanks unhillbilly for the comment. I had looked at the Ron Cemer OCR library and its performance is much below our requirement.

Since the ocr performance is less than desired, I was planning to build a character set using the screen grabs for all the digits, and using some image/pixel comparison library to compare the frame time with the character-set which will show a probabilistic result after comparison.

So I was looking for a good image comparison library(I would be OK with a non-java library which I can run using the command-line). Also any advice on the above approach would be really helpful.

回答1:

It doesn't seem like you need a full blown OCR here.
I presume that the numbers are always in the same position in the image. You only expect digits 0-9 at each of the know positions (in either black or white).
A simple template matching at each position with each of the digits (you'll have 20 templates for the 10 digits at each color) is very fast (real-time) and should give you very accurate results.

回答2:

What format is the source in (vhs, dvd, stills)? It's possible that the time stamp is encoded in the data.

Update with more detail

While I completely understand the desire to have an automated end-to-end process (especially if you're selling this app as opposed to creating an in-house tool), it'd be more efficient to have someone manually enter the start time for each video (even if there are hundreds of them ) then to spend weeks of coding getting this to work automatically.

What I'd do (failing a simple, very-fast-to-implement, super-accurate OCR solution which I don't believe exists):

Create a couple of database tables, like

video           video_group
-------         -----------
id              id
filename        title
start_time      date_created
group_id        date_modified
date_created    date_deleted
date_modified
date_deleted

video_group might contain

id| title
-----------
1 | Unassigned
2 | 711 Mockingbird @ 75
3 | Kroger storage room

video would be prepopulated with the video filenames by an import script. Initially assign everything a group_id of 1 (Unassigned)

Create a simple Winforms or WPF app (pardon my ASCII art):

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
|  Group: [=========]\/ [New group...]                            |
|                                                                 |
|  File:  [=========]\/                                           |
|                                                                 |
|  Preview                                                        |
|  |--------------------------------------| [Next Video]          |
|  | (first frame of selected video here) | [Prev]                |
|  |                                      |                       |
|  |                                      |                       |
|  |                                      |                       |
|  |--------------------------------------|                       |
|  Start Time                                                     |
|  [(enter start time value here as displayed on preview frame)]  |
|                                                                 |
|  [Update]                                                       |
-------------------------------------------------------------------

A user (anybody could do this - secretary, janitor, even a recent CS graduate). All they have to do is read the time from the preview frame, type it into the Start Time field, and Click "update" or "Next" to update the database and move on to the next one. Keep the Group selection from one video to the next unless the user changes it.

Assuming it takes the user 30 seconds to read, type and click next, They could complete 100-150 videos in an hour (Call it 75 for a more realistic estimate). And, interns are a lot cheaper than developer time.

If you really have "hundreds" of videos, it'll still be faster to do it this way than to screw around with OCR. If the OCR works for the most part, you'll most likely need to have someone manually inspect everything to see if the results are correct. which begs the question, why bother with the OCR?

回答3:

Java OCR will work perfectly for your situation (Ron Cemer here). All you need to do is remove the background image, or make it always be less than 50% white, so that the white characters will be white and the background will be black when the image is converted to monochrome.

Train JavaOCR on the font, extract that rectangular region from the image, remove the background and you're off and running.

I suggest an algorithm which looks at r,g,b and sets everything to black where r,g,b are not exactly the same values. That will leave only pixels which are perfect shades of gray. Since the image is color and the digits are monochrome, that will leave the digits and some dust.

JavaOCR wants to see black characters on a white background, so once you've done the above, you'll also need to invert the monochrome image (white = black and vice-versa). Then run that through the JavaOCR library, passing it reference samples of all of the characters you expect it to recognize, and your problem should be (at least mostly) solved.