OCR is an old technology. I remember from my early days with my own computer that I bought a handheld scanner that featured OCR capabilities, and I experimented with handwritten notes to see if it could recognise my hideous scribbling.
As far as I recall, it wasn't that great.
I'm currently building a tiny app for taking notes within podcasts, only using screenshots of the lock screen. I primarily listen to podcasts in my car on my commute, so I can't grab my phone and open Evernote to jot down what was said.
But it's easy enough to take a screenshot of the lock screen, which I often do:
The problem is, that I often forget and I end up deleting the photos weeks later.
As you can see on the lock screen photo above, there's a lot of information present there already:
- Podcast name
- Episode title
- Episode date
- Elapsed time
- Remaining time
You just need to parse the photo into text bites using OCR.
Tesseract is an open source OCR engine sponsored by Google since 2006. It was originally developed by HP between 1985 and 1994.
When you've installed the pod and added language files, it is quite simple to begin recognising text - example below in Swift.
self.tesseract.image = UIImage(data: imageData)! self.tesseract.rect = CGRectMake(24, 72, 100, 40) self.tesseract.recognize() let elapsed = self.tesseract.recognizedText self.tesseract.rect = CGRectMake(630, 72, 100, 40) self.tesseract.recognize() let remaining = self.tesseract.recognizedText self.tesseract.rect = CGRectMake(0, 112, 750, 50) self.tesseract.recognize() let title = self.tesseract.recognizedText self.tesseract.rect = CGRectMake(0, 160, 750, 40) self.tesseract.recognize() let name = self.tesseract.recognizedText let alertVC = UIAlertController(title: "Tesseract", message: "Podcast: \(name)\nEpisode: \(title)\nElapsed: \(elapsed)\nRemaining: \(remaining)", preferredStyle: .Alert) let cancelAction = UIAlertAction(title: "OK", style: .Cancel, handler: nil) alertVC.addAction(cancelAction) self.presentViewController(alertVC, animated: true, completion: nil)
imageData is in
NSData representation of an image. I use
Photos.framework to query all images on the device and filter the size of the image against the exact device screen size.
rects that I change all the time is the regions of the image where the desired information is located, as per the masks applied to the screenshot here:
The result of the text recognition above couldn't be more accurate:
Want an invite via Testflight?
I still have some way to go on this before submitting to the App Store, but I'd love to get some feedback. If you'd like to test the app, sign up here and I'll e-mail you the details when it's ready for testing.
Subscribe to Martin Normark's Blog
Get the latest posts delivered right to your inbox