One day on IRC, someone asked the question: “How do I differentiate hand-written vs printed text”. Having a small amount of Computer Vision experience, I decided to try to answer that question.
I dusted off some old code from xk3d and started modifying it to work with opencv3. It turns out that printed text and hand-written text are different enough that its possible to use heuristics to tell them apart. Using a combination of the gradient magnitude and corner detection, it was possible to come up with some simple guidelines for differentiating them.
At a high level, the process looked something like:
- run adaptive thresholding on the image
- blur the image in the X direction
- run canny edge detection on the blurred image
- use contour detection to locate all contours in the edge detected image
- bucket the contours by their heights in the Y direction
- compute the gradient direction and magnitude for each contour, filtering out any contours where the GD and GM are too small
- remove any contours that are outliers in terms of width, area or height
- starting from the top of the page, merge contours that are near each other in the X and Y axis
- throw away any lone contours that are too short in width
About 2 months after giving them the code, I received the following message:
your algo was superb. it was doing almost no errors out of the box. it brought our entire KYC system to about 95% accuracy. only a few remaining issues that we’re trying to iron out now - distorted/disoriented scans and stray marks
It turns out they were building a pipeline for their startup that automatically detects if forms have signatures in the appropriate place. After playing with the code I sent them, they were able to raise their accuracy to a pretty reasonable rate with some small adjustments to the tuneable parameters. I think this was the algorithm I’m proudest of for 2018, but I think it’s likely that some ML algorithm with ConvNets or Recurrent Neural Networks will out-perform the heuristics.