One day on IRC, someone asked the question: “How do I differentiate hand-written vs printed text”. Having a small amount of Computer Vision experience, I decided to try to answer that question.

I dusted off some old code from xk3d and started modifying it to work with opencv3. It turns out that printed text and hand-written text are different enough that its possible to use heuristics to tell them apart. Using a combination of the gradient magnitude and corner detection, it was possible to come up with some simple guidelines for differentiating them.

At a high level, the process looked something like:

  • run adaptive thresholding on the image
  • blur the image in the X direction
  • run canny edge detection on the blurred image
  • use contour detection to locate all contours in the edge detected image
  • bucket the contours by their heights in the Y direction
  • compute the gradient direction and magnitude for each contour, filtering out any contours where the GD and GM are too small
  • remove any contours that are outliers in terms of width, area or height
  • starting from the top of the page, merge contours that are near each other in the X and Y axis
  • throw away any lone contours that are too short in width

About 2 months after giving them the code, I received the following message:

your algo was superb. it was doing almost no errors out of the box. it brought our entire KYC system to about 95% accuracy. only a few remaining issues that we’re trying to iron out now - distorted/disoriented scans and stray marks

It turns out they were building a pipeline for their startup that automatically detects if forms have signatures in the appropriate place. After playing with the code I sent them, they were able to raise their accuracy to a pretty reasonable rate with some small adjustments to the tuneable parameters. I think this was the algorithm I’m proudest of for 2018, but I think it’s likely that some ML algorithm with ConvNets or Recurrent Neural Networks will out-perform the heuristics.