Table of Contents
- Introduction
- Prerequisites
- Setup
- Overview
- Step 1: Image Preparation
- Step 2: Text Extraction
- Step 3: Optical Character Recognition
- Conclusion
Introduction
Welcome to the tutorial on “Python for Computer Vision: Optical Character Recognition”! In this tutorial, we will learn how to perform Optical Character Recognition (OCR) using Python. OCR is the process of converting images containing text into machine-readable text. By the end of this tutorial, you will be able to extract text from images and convert it into editable format using Python.
Prerequisites
Before starting this tutorial, you should have a basic understanding of Python programming. Familiarity with image processing concepts and libraries like OpenCV will be helpful but not mandatory.
Setup
To follow this tutorial, you will need to have the following Python libraries installed:
- OpenCV
- Pytesseract
You can install these libraries using pip:
bash
pip install opencv-python
pip install pytesseract
Additionally, you will need to install Tesseract OCR engine. You can download the installer from the official Tesseract GitHub page and follow the instructions for your operating system.
Overview
Here’s an overview of the steps we will cover in this tutorial:
- Image Preparation: We will load and preprocess the input image using OpenCV.
- Text Extraction: We will extract the textual regions from the preprocessed image.
- Optical Character Recognition: We will perform OCR on the extracted text regions using the Pytesseract library.
Now, let’s dive into each step in detail.
Step 1: Image Preparation
In this step, we will prepare the input image for OCR by applying pre-processing techniques. These techniques help remove noise and enhance the text regions in the image.
First, let’s import the required libraries:
python
import cv2
import numpy as np
Next, we need to read the input image:
python
image = cv2.imread('input_image.jpg')
To improve the accuracy of OCR, we can apply the following pre-processing techniques:
- Grayscale Conversion: Convert the image to grayscale using
cv2.cvtColor()
function. - Noise Removal: Apply Gaussian blur using
cv2.GaussianBlur()
to reduce noise. - Thresholding: Convert the blurred image to binary using
cv2.threshold()
.
Here’s the code to perform image pre-processing:
python
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray, (5, 5), 0)
_, threshold = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
Now, we have the pre-processed image ready for text extraction.
Step 2: Text Extraction
In this step, we will extract the text regions from the pre-processed image. OpenCV provides various methods to find contours in an image. Contours represent the boundaries of textual regions.
Let’s define a function to extract the contours and filter out the non-textual regions:
python
def extract_text_regions(image):
contours, _ = cv2.findContours(image, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
text_regions = []
for contour in contours:
x, y, w, h = cv2.boundingRect(contour)
area = w * h
if area > 500 and 0.1 < w / h < 10: # Adjust these parameters based on your image
text_regions.append((x, y, w, h))
return text_regions
Finally, let’s call the function and extract the text regions from the threshold image:
python
text_regions = extract_text_regions(threshold)
At this point, we have identified the regions in the image that are likely to contain text.
Step 3: Optical Character Recognition
Now that we have extracted the text regions, it’s time to perform OCR on these regions using the Pytesseract library.
First, let’s import the required libraries:
python
import pytesseract
Next, we will iterate over the text regions, extract the corresponding ROI (Region of Interest), and pass it to the Tesseract OCR engine for recognition:
python
for region in text_regions:
x, y, w, h = region
roi = threshold[y:y + h, x:x + w]
text = pytesseract.image_to_string(roi)
print(text)
The image_to_string()
function of Pytesseract takes the ROI as input and returns the recognized text.
And that’s it! You have successfully performed OCR on the image.
Conclusion
In this tutorial, we learned how to perform Optical Character Recognition (OCR) using Python. We covered image preparation, text extraction, and OCR using the OpenCV and Pytesseract libraries. By applying these techniques, you can now extract text from images and use it for further processing or analysis. Remember to experiment with different pre-processing techniques and parameters to optimize the OCR results.
I hope you found this tutorial helpful. Happy coding!