How to Extract Text from a Scanned PDF (OCR)
Convert an unsearchable, scanned PDF into selectable text using our advanced, browser-based Optical Character Recognition engine.
The Problem with Scanned PDFs
When you scan a physical document using a traditional scanner, the resulting PDF is essentially just a photograph of the paper. You cannot select, copy, or search for text within it. This makes editing or quoting from the document impossible without retyping it manually.
Tuko's PDF OCR Solution
We use a WebAssembly port of the powerful Tesseract OCR engine. It runs locally in your browser.
- Step 1: Upload the scanned PDF.
- Step 2: Tuko generates thumbnails for every page. Select the pages containing the text you need.
- Step 3: Choose the language of the document (e.g., English or Malay) to improve accuracy.
- Step 4: Click Extract. The text is parsed locally and presented in an easy-to-copy text box.
Related guides
Keep going with nearby workflows that people usually need next.
How to Extract Text from an Image (Photo to Text)
Stop retyping data from screenshots. Use OCR to pull the text directly out of any image.
Read nextHow to Extract Text from Malay Documents (OCR)
Quickly copy text from physical Malay letters, receipts, and forms using our specialized OCR engine.
Read nextHow Tesseract.js Works in the Browser (Developer Guide)
An overview of how we compile Tesseract into WebAssembly to achieve lightning-fast OCR entirely on the client side.
Read nextRelated Tools
Need something else?
Explore All 50+ Tools