Friday, July 15, 2011

OCR = Optical Character Recognition, 文字扫描辨识软体

A friend asked me, how to scan a Chinese print out into editable Word document?

I recall, some scanner software comes with OCR software, meaning the scanner can scan the graphic and directly covert the output into text.

However is there a way if the scanner doesn't provide OCR software?

After some information gathering and research...
http://fish.pixnet.net/blog/post/22621207
http://www.info-artist.net/2009/10/ocr.html

Good news is, Microsoft Office Document Imaging (exists inside All Programs -> Microsoft Office -> Microsoft Office Tools) is the OCR tool.

Bad news is, it doesn't seems to support Chinese right away. So need to install the language engine:
http://www.microsoft.com/downloads/zh-tw/confirmation.aspx?FamilyID=DD172063-9517-41D8-82AF-29C38F7437B6&displaylang=zh-tw

Then, I realize I cannot open graphic file (e.g. jpg, gif, etc) directly inside Microsoft Office Document Imaging. There is this software JOCR.exe which is able to make use of Microsoft libray to do conversion.
http://philtzki.pixnet.net/blog/post/5189840
http://download.cnet.com/JOCR/3000-2192_4-10768898.html

Updated 29-01-2012:
Or the easier method, use online free OCR service:
Google Keyword: ocr online chinese
http://free-online-ocr.com
http://www.sciweavers.org/free-online-ocr
http://googlesystem.blogspot.com/2009/09/google-docs-ocr.html

Updated 22 Feb 2012:
This one seems more powerful. It can scan different languages, such as Chinese and produce different file formats, including Excel. Login using Google/Facebook account.
http://finereader.abbyyonline.com