Improving Arabic language Optical Character Recognition

Summary

For People with Disabilities, primarily those with visual disabilities, accessible digital documents are the gateway to a world of information. Unlike printed documents, they are compatible with both Assistive Technologies and pervasive digital tools. For example, an accessible digital document can be accessed through a Braille note taker, or it can be read on an iPad, an option that is easier for someone with a physical disability.

With regards to content in the English language, converting print documents to digital ones is relatively straightforward, so long as the typed text is clear. This is done through Optical Character Recognition (OCR), a technology that converts printed text into editable, formattable digital text. Advancements in technology have also seen this technology applied to written handwriting with varying degrees of success.

When it comes to the Arabic language, the accuracy rate for OCR is very low, making the technology effectively unusable on a wide scale. Instead, digital content in the Arabic language is either typed out in digital format from the onset, or is retyped, because scanning Arabic printed material using OCR technology is ineffective. For People with Disabilities, namely people with visual disabilities, this means a low availability of accessible digital content in the Arabic language. Furthermore, it means that the means to create such material through OCR is not available as well.

Not Initiated

Target Users

User Journey

KhaledKhalid is a 40-year-old receptionist. He is blind.

1

Khalid wants to convert a typed document into an Arabic language digital document using OCR so that he can use his Braille notetaker.

2

Khalid uses the newly developed Arabic OCR app to convert a printed document into an editable, formattable digital document.

3

Khalid reads the content of the newly created document through his Braille Notetaker.

Potential Service Features

  • Smartphone camera enabled OCR
  • Text to speech
  • Social and email sharing

Touch Points

Issue Statement

The inaccuracy and infancy of Arabic language OCR means that there is a real absence of digital accessible documents for Arabic speakers. As such, People with Disabilities, namely the Blind, cannot use their Assistive Technology to read digital content.

Expected Key Benefits

Greater access to digital documents for Arabic speaking People with Disabilities

Higher accuracy for Arabic OCR that can be used across multiple uses

Implementation Analysis

Implementation Timeline

Timeline Medium

Low

Medium

High

Technology Commercial Viability

Timeline Short

Available Now

Viable in Short Term

Viable in Long Term

Investment Requirements

Timeline Medium

Low

Medium

High

Key Implementation Considerations

1

Accuracy of OCR

2

Data collection policies

3

Intuitive Graphical User Interface