Page Content
Optical Character Recogniton (OCR) refers to technologies that allows images of text to be converted to actual text. OCR allows for the extraction of text content from old documents, infographics, photos of signs, PDFs of scanned files and any situation when text is embedded within an image.
Service Parameters
Text Transcript vs. Hidden Text
OCR tools offer a choice of outputting a text file or embedding an hidden layer of text on top of a PDF image. An example of a tool which creates hidden text is Adobe Acrobat.
We recommend the text transcript option in most cases because errors are easier to detect and correct. You can also add headings, table headers and lists as needed within the document.
Note: If you are creating a PDF with hidden text, you should copy and paste the text into a separate file to verify its accuracy.
PDF vs. Image
Note that some services work with both images and scanned PDFs and others with scanned PDFs only. If you need to use a PDF-only service, you can print or export the image as a PDF.
Penn State Service Options
The following services are licensed to Penn State staff, students and instructors.
- Sensus Access
- Equatio (equations)
- Read and Write
Note: Provides output as speech or in a Word file. - Adobe Acrobat
Note: Copy and paste text into a separate file to verify accuracy. - Anthology Ally OCR PDF (Canvas)
Note: Copy and paste text or convert to HTML to verify accuracy. - Additional OCR Tools (University Libraries)
Image Management Tips
OCR tools depend on good image quality to provide optimal results. To improve your success rates, we recommend the following tips:
- Crop photos and images to include just text.
- Use rotation tools to ensure that text is perfectly horizontal.
- If text color does not meet contrast guidelines, use tools like Photoshop to make text darker.
- If text is relatively small, use zoom tools make the image larger.
- If the file is an infographic with multiple sections, consider splitting the file into multiple sections.
Additional project tips are available from the University Libraries.
Last Update: October 11, 2023