Re: OCR

Posted by HH56 On 2022/8/23 9:35:33
I am one of several who did some of the early scans that are on the website and at the time the website was started OCR for a home system was not very accurate yet was still fairly expensive. It is a good idea though and I know Kev had thought about it then and again a few years ago. Not sure if it was time constraint to run the pdfs thru again, inaccuracy, or the large file size that is the reason it did not go forward.

I am not familiar with the program you used but one concern might be the huge file size could be an issue for some -- either just viewing or worse, trying to download and also for the storage space needed on the server. File size is something that he tried to keep at a compromise level early on because of server bandwidth and lack of high speed internet in many places which is still an issue for some. The aim was a file not so small that image quality suffered vs so large images you could blow up to the size of a newspaper and not see any pixellation. As I recall, at the time the website was started Kev tried to keep ordinary images at around 1200 pixels in width because the average CRT monitor was not really capable of displaying much more. PDFs were larger but he still tried to keep to a relatively low size.

A question now would be after running the pdfs thru OCR does the Cisderm program have a file reduction capability or could they be run thru another program for file reduction without completely destroying the OCR functionality. I realize the OCR info has to be stored somewhere but so does image detail and pdf format info. PDFs do seem to have a lot of extraneous info. When I did the 54 sales brochure Kev posted a few months ago, high res photos with the camera software and then compiling and organizing the pdf for display in an other program resulted in the native file being huge -- many many GBs. Running it thru file reduction in PDF Expert brought it down to less than 20 MB and the images and page format were still excellent. Maybe you could experiment and see what would happen with an OCRd file.

This Post was from: https://packardinfo.com/xoops/html/modules/newbb/viewtopic.php?post_id=247320