How Tesseract.js Works in the Browser (Developer Guide)
An overview of how we compile Tesseract into WebAssembly to achieve lightning-fast OCR entirely on the client side.
WebAssembly Magic
Tesseract is a powerful C++ optical character recognition engine originally developed by HP and maintained by Google. Historically, you had to install it on a Linux server to use it in a web application.
Bringing C++ to the Browser
Thanks to WebAssembly (WASM), the entire Tesseract C++ codebase has been compiled into a format that modern web browsers (Chrome, Safari, Firefox) can execute at near-native speeds.
- When you use Tuko's OCR tool, your browser downloads a `.wasm` binary file (around 2-3MB).
- It also downloads a language data file (e.g., `eng.traineddata` for English).
- Once loaded, the browser executes the OCR algorithms locally. This is why it works even if you disconnect from the internet, and why it is 100% private!
Related Tools
Need something else?
Explore All 50+ Tools