Conventionally there are only three types of PDF files. Viz.

PDF Image Only

PDF Image+Hidden Text and

PDF Normal




For simplicity of understanding we have subdivided these into the following types...

PDF Image Only
This is exact replica of the tiff file and is always better than the original document as it undergoes first seven steps of file preparation. PDF Image only files are used for archiving historical documents and are not used regularly. As such these are not searchable and is the cheapest variety of all PDF Types with limited usage.

PDF Image+Hidden Text
  • Level 1

    In this PDF type the document overlays the OCRed text. This is in one way a better version of PDF Image only type with addition of limited searchability.

    First six steps remaining the same, a tiff file is passed through an OCR engine and using steps 13 PDF files are produced. There is little post OCR manual intervention and hence the textual accuracy is as good as the quality of the input document and right selection of the OCR engine. Step 14 gives a bonus by way of compactness it renders to the PDF file.

    This is the most popular type of PDF as it gives image as well as limited searchability to PDF. Normally the textual accuracy is limited to 70-80% depending on the source document quality. This type is cost effective too as there is little manual intervention.
  • Level 2

    In this PDF type the document overlays the OCRed text and the OCRed text is near 100% accurate. This is the Best version of PDF Image+Hidden text as it gives full text searchability.

    First seven steps remaining the same, spell check, 100% proof reading and corrections makes this files textually perfect. This is a labor-intensive PDF type. One has to check every character and every word to get near 100% accurate text. This makes it an expensive PDF type and is very useful where textual accuracy is of prime importance than the cost factor at the same time display of the original document is equally important.
PDF Normal
This is a Royal PDF type where there is no image and no hidden text. Whatever is seen on the original document is reproduced as PDF. Depending on the contents of the document all 14 steps mentioned above need to be executed.

If Adobe Capture is used as a tool to produce PDF Normal, getting image is very easy. However, some OCR engines distort images which need re-insertion from the tiff image. Most of the OCR engines cannot recognize Tables, TOCs and Indexes properly and as such tables, Indexes and TOCs are invariably require rebuilding manually. This is a pure manual exercise and adds to the cost of production. Some OCR engines cannot capture/recognize small fonts and keep the text as bitmap image. In such cases a lot of text need to be added manually.

The best part of PDF Normal is its clean appearance and its compact size. A good PDF Normal having text only can be as small as 8kb, which makes this file, format the most preferred type for web publishing. It is said that on an average PDF Normal should be around 11kb.

We in PDF India have mastered the art of producing PDF Normal from any document, in any language, of any quality. We assure good appearance, near 100% accuracy as well compact PDF Normal.

<-- Click here to go back
 
 
Copyright © 2000, 2001 PDFIndia.com. All rights reserved.
Home | Services | Process | Clients | Quality | Contact | Links