A digital PDF converts to Word with near-perfect accuracy. A scanned PDF requires OCR, and the result depends entirely on scan quality. Here's how to tell which type you have and what to expect.
You convert two PDFs to Word. The first one comes out nearly perfect — the text, formatting, and even some layout structure survive the conversion. The second one comes out as a jumbled mess of misrecognized characters and random line breaks. Same tool, completely different results. The difference: one was a digital PDF, the other was a scanned PDF. They look the same on screen, but they are fundamentally different file types.
Our PDF to Word converter handles both, but the quality ceiling for scanned PDFs is lower — and it is determined entirely by the scan quality, not by the conversion tool. Here is how to tell which type you have and what results to expect.
Open your PDF. Try to select and copy a sentence of text with your mouse. If you can select the text, it is a digital PDF — the text is stored as actual characters in the file. Conversion extracts these characters directly. Accuracy should be 99%+.
If you cannot select text — your cursor just draws a box, and nothing highlights — it is a scanned PDF. The "text" is actually an image of text. Conversion requires OCR (Optical Character Recognition) to identify the shapes as letters. Accuracy depends on scan quality.
Excellent (300+ DPI, clean, well-lit): a document scanned on a modern flatbed scanner at 300 DPI or higher, with even lighting, no skew, and dark text on a white background. OCR accuracy: 98-99%. The result will have occasional errors — "rn" read as "m," "cl" read as "d" — but is otherwise clean. Expect 1-2 errors per page.
Good (200-300 DPI, slight imperfections): a document from a typical office scanner, slightly skewed, with some background texture or faint shadows at the page edges. OCR accuracy: 95-98%. Expect 3-5 errors per page. The text is usable but needs proofreading.
Fair (150-200 DPI, noticeable issues): a document scanned on a phone app, with uneven lighting (darker near the spine of a book), some blur, or low contrast. OCR accuracy: 90-95%. Expect 5-10 errors per page. The text is readable in bulk but individual sentences may be garbled.
Poor (under 150 DPI, significant issues): a fax-quality scan, a photo of a document taken at an angle, or a scan with heavy shadows, stains, or handwritten annotations. OCR accuracy: 70-90%. The result is a starting point for manual retyping, not a finished document.
Scan at 300 DPI minimum. Going from 150 DPI to 300 DPI doubles the pixel count in each dimension — four times the data for the OCR engine to work with. The accuracy jump is significant. Going above 300 DPI (to 600 DPI) helps for very small text (under 8pt) but provides diminishing returns for normal text.
Use grayscale, not pure black and white. Pure black-and-white scanning (1-bit) removes anti-aliasing at character edges, making letters look jagged. OCR engines are trained on smooth-edged text. Grayscale scanning preserves the edge smoothness.
Flatten the page. If scanning a book, press the page flat against the scanner glass. The curvature near the spine creates shadows and distorted text that OCR handles poorly.
Check for skew. If the text is rotated even 1-2 degrees, OCR accuracy drops. Most scanning software has an auto-deskew option — use it.
Our PDF to Word converter uses Google Vision OCR, which handles moderate skew and lighting variation better than older OCR engines. But the fundamental rule remains: garbage scan in, garbage text out. For a guide to fixing formatting after conversion, see our PDF to Word formatting survival guide.