I got two fancy scanners in the mail. It turned out to be from a company called CZUR which makes overhead book scanners.
It was a surprise, and this is what I found out
Here is a picture of the two scanners after I unwrapped them.
I opened it up and found what appeared to be a very awkward looking scanner which scanned books and other documents from above. Not in the usual way. I thought, what am I going to do with this.
I started studying the specs and found it to be a very sophisticated book and document scanner. With optical character recognition in 187 languages.
I thought, OK I will give it a test.
The CZUR ET16 on the left is the low end version, without the sophisticated software, but quite capable of scanning books and documents from above. It costs about the same as a regular scanner.
The CZUR M3000 on the right is the high end scanner with lots of bells and whistles. It is a lot more expensive, but it can do some things that you may find useful.
The M3000 has an HDMI connection for projector. This could be very useful for lecturers who just want to bring a book to class and lay it out and let the students read it as the professor talks about it.
It had some attributes that the usual scanner does not have. It was designed to put books or documents on a base, so they can be scanned from above, as opposed to the usual way of turning the book upside down.
There is a foot switch or hand switch so that it can be scanned in rapid succession.
I scanned one of my Chinese book at 300 dpi. It scanned double pages and converted it to .docx or .pdf in about 5 seconds. 5 seconds to scan and 5 seconds to convert using OCR
That is a lot faster than the last scanner I had.
I looked up the price, it is very expensive.
I thought: well, this is nice. What else can it do for that much money. So I tried out the OCR.
I scanned a .jpg image of two pages in a book with standard printed Chinese fonts.
It converted to Unicode in about 5 seconds and it appeared to be correct.
This is very good and useful if you have modern books written in regular sized Chinese in regular fonts. See the following page for an example.
It can isolate pictures from text, but had a little trouble isolating some of the pictographs, possibly because of the bleed through form the other side of the page.
Then I wondered how much OCR had improved since I last studied it.
OCR (The hard test)
I gave it a copy of the famous book on Chinese etymology by 康殷 Kang Yin 漢字源流淺說 “Han zi yuan liu qian shuo”
This is an older book using the photo lithographic offset printing of a hand written page in Chinese by Kang Yin himself
The pages are hard to read for humans, because, although he is a calligrapher, he is one of those calligraphers with bad hand writing that no one can read. Well, OK, You can read it, but it is very difficult.
The OCR was able to separate the characters in most cases but unlike humans who will not make a guess unless they are quite sure of the character, the OCR guessed at almost all the characters and got almost all of them wrong.
This is not to blame the OCR, OCR is not capable of reading hand written text yet.
Kang Yin is a well known Chinese character etymologist, but he was also a Chinese calligrapher.
This is a page from an old Chinese book where the author wrote the text and then it was photographed and not type set.
Chinese have a bad habit I do not like. It goes like this.
You will notice on the publication page, his name is 康殷 which are real Chinese characters that everyone can read.
But Chinese have a very bad habit
Notice on the title page his name appears as 康is an obscure form of 殷. What kind of crap is this?
On the inside page his name appears as 康？
The ? mark indicates that this character can not be found in any Chinese dictionary.
This is a stupid thing that Chinese calligraphers do that makes Chinese unreadable.
But back to the CZUR Scanner
Many of us will be wanting to scan our libraries and other documents and store them as digital libraries. This is a very good machine if you have a lot of stuff to scan. The cheep version is affordable by individuals, the expensive version will be a bit too expensive for most people. Anyway, it will be very useful for me, now I can convert my entire library to digital.
Uncle Hanzi in Tunxi 2017
Note：Hanzi means Chinese character.