Scannable Chess Scoresheets with a Convolutional Neural Network and OpenCV

A case study in innovation

Innovation starts with solving personal problems. Every chess player writes down their moves on a chess scoresheet during a tournament game to analyze soon after. The annoying part is you have to record your moves once during the game, and then again into a computer to analyze the moves you played.

On top of this, it’s super easy to keep highly disorganized computer files (keeping track of moves in our minds, unfortunately, doesn’t translate well to keeping track of papers in real life). Instead, we store loose scoresheets in the musty depths of a backpack — crumpled up to oblivion.

One year ago, my friend Alex and I were in the final round of a team chess tournament. With five minutes on the clock, humongous cameras from a local TV station flooded my surroundings. After a blitzkrieg of moves, I had him! Check… mate?

Turns out, I had sacrificed a rook for nothing, and the match was lost. Alex said, “Looks like another night in a dark room with Stockfish.” (A common chess engine).

I didn’t have the will to open my laptop, type those moves in, and relive my darkest moments. But my drooping eyes suddenly jolted up: why not make a system that scans scoresheets and digitizes them? Alex was equally excited, and the quest began.

Introducing Reine

Today, we present Reine: an open-source chess scoresheet scanner that can easily be adapted for other handwriting scanning purposes. Chess players currently have two options: one is to write moves down while playing a game, then manually type them into a computer.

This manual entry can take up to 30 minutes for a whole tournament, or 4–5 hours for a tournament director to enter all the games. The other option is to purchase a $500 digital chess recorder that has been approved by the US Chess Federation (MonRoi or Plycounter). Now, Reine allows you to scan a single game in less than 10 seconds.

The goal of this article is to teach how a complete, ready-to-ship machine learning and computer vision project can be put together. Also, while the program currently works in one specific domain for one custom scoresheet, our hope is that it will be clear how you can generalize this for any kind of high-accuracy automated form processing.

Preliminary Planning

Starting out, we asked ourselves a few key questions: Why hasn’t our idea been done before? What’s difficult about it? What can we do to mitigate those issues?

With scannable scoresheets for chess, the problem is accuracy. We didn’t want to go through the trouble of recognizing connected handwriting scrawled over the constipated boxes of conventional scoresheets like this one:

That would be difficult to build software for — and it’s just one variant of a conventional scoresheet. Creating software for multiple types would be nearly impossible. For these reasons, we decided to split up the moves into characters in a custom scoresheet, also allowing more vertical space for each character and some QR-esque boxes that will later be used for alignment. This is what it looks like:

This way, characters are written individually (and mostly) within the gridlines. A clean copy of our scoresheet is available here.

Building the Model

If you hadn’t guessed already, we’ll be using a CNN trained on EMNIST to recognize characters. But how can we maximize accuracy? With 62 classes, high accuracy is very difficult to achieve!

How did we decide that five characters per ply* were enough? A database of 20,000 Lichess games was analyzed for the number of characters per move, and we came up with this distribution:

*A ply is one move for one side, or half a move (e.g. “d4”)

We’re able to get 99.9% of moves with a maximum of five characters. We don’t want to ignore the longer moves, but this can be solved by looking at the general possible move formats for each number of characters:

One very useful pattern arises: if we ignore checks, the last two characters of any move MUST notate a square on the chessboard — a lowercase letter, followed by a number. Also, moves that are 6 characters long become only 5 characters long. Thus, we decided: do not write a “+” for checks on a Reine scoresheet!

Also, 99% of pawn promotions are to queens, so we assume that any pawn reaching either end of the board is promoted to a queen. This brings 7 character moves down to 4 character moves — and no move takes more than 5 characters to notate on a Reine scoresheet. With these adjustments, the longer moves won’t be ignored.

This is one example of the difference between copy pasting and new work. Think about new techniques and optimizations from a broader perspective of your end goal, with the whole project in mind, then get help from StackOverflow as you run into execution problems. There will be more examples of this kind of thinking throughout in the steps below.

Let’s get from a smartphone picture to a list of predicted characters first — the important part.

Step 1: Align Image

We want to use the “ArUco” markers in the four corners here to align this image, so that all we see are the moves. Let’s import everything we’ll need for the whole project:

#for all tasks
import cv2 as cv
from cv2 import aruco  # from opencv-contrib-python
import numpy as np
import chess.pgn
import itertools
import io
#for machine learning only
import pandas as pd
import numpy as np
import matplotlib.image as mpimg
import seaborn as sns
%matplotlib inline
import matplotlib.pyplot as plt


from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
import itertools

from keras.utils.np_utils import to_categorical # convert to one-hot-encoding
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D
from keras.optimizers import RMSprop
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ReduceLROnPlateau

We use the ArUco markers for pose estimation (finding the direction the paper is pointing in 3D space), and then correct for those values to make the scoresheet look face-on. We got some help with alignment from here, an attempt at a similar project. This is our two-step process:

  1. Find ArUco markers with a built-in function, 4×4 markers in this case. This function returns multiple values per marker (coordinates and marker ID), so the function get_data() gives us just coordinates.
  2. The OpenCV docs describe the get_transform() function: “For perspective transformation, you need a 3×3 transformation matrix…. To find this transformation matrix, you need 4 points on the input image and corresponding points on the output image.” The function cv.getPerspectiveTransform() finds the matrix, and cv.warpPerspective() uses it to adjust the image perspective.
# detect marker
def find_markers(img):
    aruco_dict = aruco.getPredefinedDictionary(aruco.DICT_4X4_50)
    # res[0] is the coordinates, res[1] is the marker ids
    res = aruco.detectMarkers(img, aruco_dict)
    return res

# finds the coordinates on the image that will become the edges of the transformed image (outermost marker corners)
def get_data(marker_pos, marker_id, aruco_res):
    # np.where() returns a tuple with two elem is only one column
    data_point = aruco_res[0][np.where(aruco_res[1] == [marker_id])[0][0]][0][marker_pos]
    return data_point

def get_transform(numpy, marker_numpy):
    rect = np.zeros((4, 2), dtype='float32')
    for marker in range(4):
        # marker used twice because we use markers # 0, 1, 2, and 3
        rect[marker] = (get_data(marker, marker, find_markers(marker_numpy)))
    # top-left, top-right, bottom-right, bottom-left
    (tl, tr, br, bl) = rect

    # width of new image = max distance between br/bl and tr/tl
    width_1 = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
    width_2 = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
    max_width = max(int(width_1), int(width_2))

    # height, same calculation
    height_1 = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
    height_2 = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
    max_height = max(int(height_1), int(height_2))

    dimensions = np.array([
        [0, 0],
        [max_width - 1, 0],  # minus one to make sure it all fits
        [max_width - 1, max_height - 1],
        [0, max_height - 1]], dtype='float32')

    transform_mat = cv.getPerspectiveTransform(rect, dimensions)
    aligned_image = cv.warpPerspective(numpy, transform_mat, (max_width, max_height))
    return aligned_image

Step 2: Remove Shadows

Now, we’ll use OpenCV to remove shadows from the image. We make a copy image and from this copy subtract only the pixels we want to keep (i.e. not shadows), and then subtract this copy image from the original image to get a de-shadowed image.

  1. Dilate & median blur the image to get rid of thin lines (text, handwriting, gridlines).
  2. Subtract this new image from the original image, essentially subtracting out shadows.
  3. Step 1 isn’t perfect, so everything will be lighter than the original image. Normalize to increase contrast.

Here is the code we used.

This is what we have now:

Step 3: Cut Up Boxes

Thankfully, others have come up with a method to detect gridlines and find contours after. Here is an overview and the edits we had to make.

1. Define horizontal and vertical kernels that you’ll essentially run over the whole image, looking for horizontal and vertical lines.

2. Create a new image out of those kernels combined.

3. Dilate and then Erode (i.e. close) all the lines. Contours that have bumpy borders or missing pixels won’t be detected properly by the findContours() function otherwise. This is our addition — because we cannot read what we cannot see, it’s imperative that ALL 500 boxes are accounted for (the number of possible characters on our scoresheet). We add these lines to our image before the findContours() function is used.

  # closing the lines so all contours are detected
    vert_close = cv.morphologyEx(img_final_bin, cv.MORPH_OPEN, vertical_kernel)
    hori_close = cv.morphologyEx(vert_close, cv.MORPH_OPEN, hori_kernel)
# only contours of about the size of one box, etc. not the whole scoresheet
    def get_true_contours(pre_contours):
        post_contours = []
        for pre_contour in pre_contours:
            the_x, the_y, the_w, the_h = cv.boundingRect(pre_contour)
            # scoresheet-specific but we resize so all good
            if 20 < the_w < 55 and 30 < the_h < 65 and the_w < the_h: 
        return post_contours

4. Finally, sort the boxes left-to-right, top-to-bottom. We sort the left and right halves of the scoresheet separately, of course. This is a replacement for the given get_contour_precedence() function in the Medium article, which only sorts top-to-bottom.

5. Sorting: The top-to-bottom sorting is done first with the function shown directly below. The variable row_y has 25 y-values (for 25 moves/rows) — the y-values of the contours will be close to exactly one of these values. From there, sorting is done left-to-right using boundaries given by cv.boundingRect() and separately for each half of the sheet.

# sort the contours from left to right, top to bottom (left column before right column)
def get_contour_precedence(contour, row_y, half):
    # completely dependent on the fact that the contouring (besides sorting) is perfect
    x, y, w, h = cv.boundingRect(contour)
    row_num = None
    for row in row_y:
        if row - h / 3 < y < row + h / 3:
            row_num = row_y.index(row)

    # the boxes in the second half of the page will all be sorted after the first half
    if x + round(w / 2) < half:
        return 10000 * (row_num + 1) + x
        return 1000000 * (row_num + 1) + x

And now, we’re here:

But EMNIST looks like this!

Step 4: Preprocessing Our Data

We’re lucky — the paper that summarizes the creation of the EMNIST dataset from the NIST government dataset (which contains natural characters similar to the ones we had cut up) explains how to preprocess each character:

We do the following:

  1. Resize characters to 138×138 Because boxes are taller than they are wide, we add horizontal whitespace to maintain character proportion.
  2. Invert the image.
  3. Threshold the input image, making it only black and white.
  4. Apply a Gaussian blur to reduce bumpiness and noise.
  5. Cut off some lines on the outside so that any of the gridlines aren’t included in the input to the model. Now we have a 128×128 image.
  6. Center the image by mass.
  7. Repeatedly delete one pixel from each side as long as all border pixels are black. Note: because less whitespace is removed for larger characters, they will have thinner strokes after we resize to 28×28, and vice versa. This is the reason for step 8.
  8. We add one step not done in the EMNIST paper: after removing black border pixels, we erode smaller images and dilate larger images so that stroke width is proportional to the size of the image. (This significantly improved results.)
  9. Add 2px border padding, as per the EMNIST paper.
  10. Resize the image to 28×28. Use the “bicubic interpolation” method to create variance in whites (i.e. the edge of a stroke is lighter than its center). This is important in character recognition because information about the edge of a stroke tells the model where the character ends.

This chunk of code contains all of the preprocessing, along with comments:

Step 5: Recognition

With 62 classes, the best accuracies for EMNIST we found were under 90%, even for the validation set. Real handwriting would be worse. Looking at the generalized possibilities ignoring checks, we can tackle this problem by limiting the classes to only the characters that could actually appear at each position in a chess move:

Here are the models we need, and the improvements from the original 4% random classification accuracy you would have with 25 classes:

Each model now has a heavily reduced number of classes — and therefore, higher accuracy. (Note: the “Letter” model does not include “x,” and the “number” model includes “O” to account for castling.)

Now we have characters that look like the EMNIST dataset, and we know exactly what classes to use for our CNN. We just have to train them. You can use any model you’d like: an ensemble, a single CNN, or experiment with different numbers of epochs.

Two aspects were particularly important for us, though: choosing only the necessary characters of EMNIST to train, and data augmentation. We used this Kaggle kernel, but with the following necessary adjustments:

1. Remove unnecessary characters from the training data. Recall that we need five different nets: Letters, Numbers, Pieces, Pieces or Letters, and Letters or X’s or Numbers. We illustrate how you might make the Letters or X’s or Numbers net only, since it’s the most complicated of the required models.

#set a counter to indicate when a chunk has been loaded
trainchunks = pd.read_csv('emnist-byclass-train.csv', sep='s*,s*', header=0, delimiter=",", engine='c', na_filter=False, dtype=np.int64, chunksize=50000, names=hdr)
for chk in trainchunks:
  #define which indices of characters we want to keep from the mapping
  #if a character is not in this list of keeps, delete it
  for i in range(0,63):
    if i not in keeps:
      chk=chk[chk.labelz !=i]
  #because the to_categorical function must be 0-indexed and without leaps, 
  #we have to make sure after all this the indexes go in order starting from 0.
  for i in range(0,len(chk["labelz"])):
    if chk.iloc[i]["labelz"]==33 or chk.iloc[i]["labelz"]==61:
    if 35<chk.iloc[i]["labelz"]<44:

2. Data augmentation. Since people will write erratically during a chess game, we don’t know that the postprocessing will give us a perfect EMNIST-type fit. This is especially true if people write at angles or heavily weight a character to one side (messing with the mass-centering in preprocessing). So, we want augmentation to look something like this:

datagen = ImageDataGenerator(
        zoom_range = 0.3,  

Tweak these numbers to figure out what amount of augmentation works best for your project.

Our code with heavy commenting can be found here, but other than the above changes it’s not much different from a standard EMNIST+CNN procedure.

Step 6: Postprocessing (last one!)

Our postprocessing is best explained by example:

The model gives us a confidence for each possible character in each box. It might say, N-90%, K-10%, B/R/Q-0% for one of the “piece” boxes. We check each possible move composed by characters to which our model assigns a high enough confidence. (We check at least the top 2 characters, regardless of confidence). For example, consider the string of moves 1. d4 d2. Black can’t play d2 on move 1! But we look at the next most likely predictions, perhaps d3 for white and d5 for black. The correct move should be one of the following:

1.d4 d2

1.d4 d5

1.d3 d2

1.d3 d5

Only 1.d4 d5 and 1.d3 d5 are valid, so we ignore the other two and continue to the next move.

Now, let’s examine the code, which can be found here:

Postprocessing consists of two main steps:

  1. Translate the list of character probabilities into real moves.
  2. Determine which combinations of moves form a valid game.

Step 1 is fairly intuitive. We start by filtering out unlikely characters for each box (as explained previously). We then use the itertools.product() function to find all the possible combinations of these characters (meaning, the moves).

Before we start checking whether these moves are valid, we further decrease the number of moves we have to check: we used five distinct models due to restrictions on which characters can be in a given position for a move, and there are additional restrictions based on the other characters in the move. (While “Bbc5” and “bxc5” are valid moves, “bbc5” is not!) The following function eliminates such illegal moves:

# eliminate 'moves' that cannot be legal
def legal(h_move):
    nums = ['1', '2', '3', '4', '5', '6', '7', '8']
    pawns = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']
    caps = ['B', 'K', 'N', 'Q', 'R']

    if len(h_move) == 2:
        if h_move[0] in pawns and h_move[1] in nums:
            return True
    elif len(h_move) == 3:
        if (h_move[0] in caps and h_move[1] in pawns and h_move[2] in nums) or 
                h_move == 'O-O':
            return True
    elif len(h_move) == 4:
        # when the first char is a piece
        if h_move[0] in caps and (h_move[1] == 'x' or h_move[1] in nums or 
                h_move[1] in pawns) and h_move[2] in pawns and h_move[3] in nums:
            return True
        # when the first char is a pawn
        elif h_move[0] in pawns and h_move[1] == 'x' and h_move[2] in pawns and 
                h_move[3] in nums and 
                abs(pawns.index(h_move[0]) - pawns.index(h_move[2])) == 1:
            return True
        if (h_move[0] in caps or h_move[0] in pawns) and (h_move[1] in nums or 
                h_move[1] in pawns) and h_move[2] == 'x' and h_move[3] in pawns and 
                h_move[4] in nums:
            return True
    return False

Next is step 2: finding all possible legal games. The Python Chess library [LINK] is useful here, and we take advantage of its Board objects, which represent a single chess position (chess players know these as FENs). We had mercy on our RAM (but sacrificed a little speed) by repeatedly calling a function to update global variables on a move-by-move basis.

In the following code, boards contains lists of “moves”, which are lists of all the boards that are valid for that move number (boards[0][0] was initialized to the starting position). The global pgn is structured similarly but contains the lists of moves made to arrive at those boards (Board objects store only position data).

def check_iterative(move_num):
    # check some moves even if we don't find a valid pgn spanning the whole game
    global best_guess 
    global game_result
    global highest_checked  # the highest move num for which a valid pgn was found

    for _move in possible_moves[move_num]:
        # boards are an object from python chess library representing an FEN
        for board_num in range(len(boards[move_num])):
            new_board = boards[move_num][board_num].copy()
                # Python Chess function that makes this move on the board
                # if the move is illegal a ValueError will occur here

                # if there is no error
                boards[move_num + 1].append(new_board)
                if len(pgn[move_num]) > 0:
                    game = pgn[move_num][board_num] + [_move]
                else:  # only occurs on move 1
                    game = [_move]
                pgn[move_num + 1].append(game)

                # we return '*' if not all games have same result
                if game_result != '*' and move_num == len(possible_moves) - 1:
                    if game_result == '':
                        game_result = get_game_result(new_board, move_num + 1)
                    elif not game_result == get_game_result(new_board, move_num + 1):
                        game_result = '*'

            except ValueError:
                if move_num > highest_checked:
                    best_guess = pgn.copy()
                    highest_checked = move_num

    pgn[move_num] = []  # clear variables that are no longer necessary to save memory
    boards[move_num] = []

    return  # no return value needed because the modifies globals

Run check_iterative() once for each ply, and the result is your PGN. All that’s left is the formality of reformatting the moves into a nice PGN string, and finally a .pgn file, which looks like this:

It turned out we had two mistakes on this scoresheet, giving us over 96% accuracy. Fix a few moves, open in a chess engine GUI, and…



Thanks for reading! Email us at [email protected] or [email protected] if you’d like to get involved or have any questions. Our company email is [email protected], if you have any business inquiries. Good luck with your next revolutionary idea!

Here’s the website for the project:

You can find the open-source code here: blog post:

Hacker News post (#1 on Show HN!):

~ Rithwik Sudharsan & Alex Fung

Avatar photo


Our team has been at the forefront of Artificial Intelligence and Machine Learning research for more than 15 years and we're using our collective intelligence to help others learn, understand and grow using these new technologies in ethical and sustainable ways.

Comments 0 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *