Re: Search Pages

Posted by PackardV8 On 2010/11/2 16:00:18
ok. Suppose someone sends me a PDF or i download a PDF to my 'puter. Further suppose the pdf contains just one page of the 55-56 Packard Parts Catalouge.
What canned utility (if any exist ) do i use, to say, strip off (capture) the left most 10 bytes from the left most side of the page. Those 10 bytes would be the part numbers.

What's not clear to me is how File/Record handling is accomplished in the PC world.

To me the ENTIRE parts catalouge is a FILE. Each printed line on any given page is a single RECORD of say 80 bytes (old card image format).

It is my understanding that MSFT has an MSFT Cobol application or package that can run on a PC, altho very slowly i'm sure.

Accomplishing the 2-file match is childs play in cobol using any of the old IBM, Honeywell/GE file structures with Cobol formating.

My problem is that i don't know how PDF's are structured in terms of what is a record, record length, ASCII or ANSI character sets, etc.

WE're not dealing with any graphics here. Only byte-by-byte text characters.

This Post was from: https://packardinfo.com/xoops/html/modules/newbb/viewtopic.php?post_id=63334