Merry Christmas and welcome to Packard Motor Car Information! If you're new here, please register for a free account.  
Login
Username:

Password:

Remember me



Lost Password?

Register now!
FAQ's
Main Menu
Recent Forum Topics
Who is Online
117 user(s) are online (115 user(s) are browsing Forums)

Members: 1
Guests: 116

Mike Chirco, more...
Helping out...
PackardInfo is a free resource for Packard Owners that is completely supported by user donations. If you can help out, that would be great!

Donate via PayPal
Video Content
Visit PackardInfo.com YouTube Playlist

Donate via PayPal



« 1 2 (3)

Re: Search Pages
#21
Home away from home
Home away from home

PackardV8
See User information
ok. Suppose someone sends me a PDF or i download a PDF to my 'puter. Further suppose the pdf contains just one page of the 55-56 Packard Parts Catalouge.
What canned utility (if any exist ) do i use, to say, strip off (capture) the left most 10 bytes from the left most side of the page. Those 10 bytes would be the part numbers.

What's not clear to me is how File/Record handling is accomplished in the PC world.

To me the ENTIRE parts catalouge is a FILE. Each printed line on any given page is a single RECORD of say 80 bytes (old card image format).

It is my understanding that MSFT has an MSFT Cobol application or package that can run on a PC, altho very slowly i'm sure.

Accomplishing the 2-file match is childs play in cobol using any of the old IBM, Honeywell/GE file structures with Cobol formating.

My problem is that i don't know how PDF's are structured in terms of what is a record, record length, ASCII or ANSI character sets, etc.

WE're not dealing with any graphics here. Only byte-by-byte text characters.

Posted on: 2010/11/2 16:00
VAPOR LOCK demystified: See paragraph SEVEN of PMCC documentaion as listed in post #11 of the following thread:f
packardinfo.com/xoops/html/modules/newbb/viewtopic.php?topic_id=7245
 Top  Print   
 


Re: Search Pages
#22
Webmaster
Webmaster

BigKev
See User information
Keith,

I dont think you are understanding. PDF's can contain both textual data, and images. Most PDFs contain both. The text as text, and the images as images. All the PDF content on the website are made from scanned pages. So each page is a image, not text. Basically like taking a photograph of each page. So you can't simply extract the part number for the pages, as they are not data anymore. It's a picture of the data.

The only way to turn a scanned image back into data is to run it through an OCR software. OCR stands for optical character recognition. This is specialized software that look at blobs of pixels in the image and tries to detect if they are letters and numerals. To make the OCR work even half way decent you really have to start with a very high quality scan. The pdf files here on the website are not scanned for that purpose, but scanned for web downloading. So there is a significant amount of compression applied to them to make the files small. This regrades the pixel data and makes the pages almost impossible to get an accurate OCR from.

I know all about this as this is the type of software I write for a living.

So there are only two options for getting the Parts Manuals into a database format. Hand keying all the data into a spreadsheet, and then doing a mass import into the database. Or rescanning the Parts Manuals from clean sources in high quality mode, and then OCR the pages into a excel spreadsheet. That would then have to be double checked for OCR errors. Either way, this would take a long, long time. If someone want to volunteer, there I would be more than happy for them to do it, and I will build all the database backend to support it.

Just as a point of reference it takes me about 8 hours to hand-key all the information in from the each of Packard Directories (list of dealers) into the Dealership List. I have already done 4 and have 3 more to do.

Posted on: 2010/11/2 18:17
-BigKev


1954 Packard Clipper Deluxe Touring Sedan -> Registry | Project Blog

1937 Packard 115-C Convertible Coupe -> Registry | Project Blog
 Top  Print   
 


Re: Search Pages
#23
Home away from home
Home away from home

JWL
See User information
Kev, Scanned photos can be in .jpg and .pdf formats, correct?

(o{I}o)

Posted on: 2010/11/3 10:58
We move toward
And make happen
What occupies our mind... (W. Scherer)
 Top  Print   
 


Re: Search Pages
#24
Webmaster
Webmaster

BigKev
See User information
Photos should be in jpg format. PDFs should be for documents, and anything that is multiple pages.

The Photo Archive here on the website only takes JPGs.

Posted on: 2010/11/3 11:35
-BigKev


1954 Packard Clipper Deluxe Touring Sedan -> Registry | Project Blog

1937 Packard 115-C Convertible Coupe -> Registry | Project Blog
 Top  Print   
 


Re: Search Pages
#25
Webmaster
Webmaster

BigKev
See User information
The advanced Search Box is now fixed and the filters at the bottom now are honored. So if you want to search only a specific area now you can.

Thanks,

Posted on: 2010/11/3 11:50
-BigKev


1954 Packard Clipper Deluxe Touring Sedan -> Registry | Project Blog

1937 Packard 115-C Convertible Coupe -> Registry | Project Blog
 Top  Print   
 




« 1 2 (3)





- The following Google Ad-Sense Advert helps fund the cost of providing this free resource -
- Logged in users will not see these. Please Join and Donate to help support the website -
Search
Recent Photos
Photo of the Day
Recent Registry
Upcoming Events
Website Comments or Questions?? Click Here Copyright 2006-2024, PackardInfo.com All Rights Reserved