differentiating handwritten vs printed text

One day on IRC, someone asked the question: “How do I differentiate hand-written vs printed text”. Having a small amount of Computer Vision experience, I decided to try to answer that question.

I dusted off some old code from xk3d and started modifying it to work with opencv3. It turns out that printed text and hand-written text are different enough that its possible to use heuristics to tell them apart. Using a combination of the gradient magnitude and corner detection, it was possible to come up with some simple guidelines for differentiating them.

At a high level, the process looked something like:

run adaptive thresholding on the image
blur the image in the X direction
run canny edge detection on the blurred image
use contour detection to locate all contours in the edge detected image
bucket the contours by their heights in the Y direction
compute the gradient direction and magnitude for each contour, filtering out any contours where the GD and GM are too small
remove any contours that are outliers in terms of width, area or height
starting from the top of the page, merge contours that are near each other in the X and Y axis
throw away any lone contours that are too short in width

About 2 months after giving them the code, I received the following message:

your algo was superb. it was doing almost no errors out of the box. it brought our entire KYC system to about 95% accuracy. only a few remaining issues that we’re trying to iron out now - distorted/disoriented scans and stray marks

It turns out they were building a pipeline for their startup that automatically detects if forms have signatures in the appropriate place. After playing with the code I sent them, they were able to raise their accuracy to a pretty reasonable rate with some small adjustments to the tuneable parameters. I think this was the algorithm I’m proudest of for 2018, but I think it’s likely that some ML algorithm with ConvNets or Recurrent Neural Networks will out-perform the heuristics.

okay