I’m a software engineer by trade. But I seem to keep dipping my toes into the machine learning and data science communities. There are some awesome libraries out there, but I admit I’ve had a difficult time wrapping my head around many of them.
Surprisingly, it’s not the math. It’s actually the code itself that I find myself getting stuck on. It’s understandable. Tensor algebra is hard enough without needing to worry about style guides and code reviews.
The other day I was reading some code from Apple that infers shapes from Core ML neural network files. It’s great to see Apple getting their hands dirty in the open source community. But, I got stuck trying to understand this one function and have decided to document my processes of figuring it out.
def _crop(layer, shape_dict):
params = layer.crop
Seq, Batch, Cin, Hin, Win = shape_dict[layer.input[0]]
l = r = t = b = 0
if len(layer.input) == 1:
if len(params.cropAmounts.borderAmounts) != 0:
t = params.cropAmounts.borderAmounts[0].startEdgeSize
b = params.cropAmounts.borderAmounts[0].endEdgeSize
l = params.cropAmounts.borderAmounts[1].startEdgeSize
r = params.cropAmounts.borderAmounts[1].endEdgeSize
Hout = Hin - t - b
Wout = Win - l - r
else:
Hout = shape_dict[layer.input[1]][3]
Wout = shape_dict[layer.input[1]][4]
shape_dict[layer.output[0]] = (Seq, Batch, Cin, int(Hout), int(Wout))
Can you read this and understand what’s happening? If so, you are smarter than me. This code doesn’t really provide much context for anything. Here were my thoughts reading through this function the first few times:
def _crop(layer, shape_dict):
# Okay. Cool. We have some function Crop. I guess it takes a layer and a shape_dict.
# Wonder what _crop will do. Let's. find. out.
params = layer.crop
# cool. there are some crop params. I wonder what they're used for? I guess we'll
# find out eventually.
Seq, Batch, Cin, Hin, Win = shape_dict[layer.input[0]]
# What are Cin, Hin, Win? Are these all classes because they're capitalized?
l = r = t = b = 0
# l, r, t, b. What is this? ah, left right top bottom would make sense.
# What do they refer to? The size of the input layer image? Why are they
# all zero initially?
if len(layer.input) == 1:
# Okay, so if there's only one input blob we do this. Is this
# important in some way? Why would there only be one input blob?
if len(params.cropAmounts.borderAmounts) != 0:
# ah, here's the params settings. I guess the layer params tell us about
# how the layer is cropped.
t = params.cropAmounts.borderAmounts[0].startEdgeSize
b = params.cropAmounts.borderAmounts[0].endEdgeSize
l = params.cropAmounts.borderAmounts[1].startEdgeSize
r = params.cropAmounts.borderAmounts[1].endEdgeSize
Hout = Hin - t - b
Wout = Win - l - r
# Ahh so here's what we're trying to compute. The Hout and Wout (which I will
# assume is output height and width).
else:
# Okay "else", wait, what's my initial condition again? Ah if there is one input layer.
# I guess there can be more than one.
Hout = shape_dict[layer.input[1]][3]
Wout = shape_dict[layer.input[1]][4]
# Since we're only accessing the first index of the input layer, I guess there are only
# two possible? I wonder if there are more than two if that's a problem. /shrug
shape_dict[layer.output[0]] = (Seq, Batch, Cin, int(Hout), int(Wout))
# ahh so now I see that we're modifying what looks like the first output layer with the new
# values we computed. Interesting that we're casting them as an int. I wonder if they're
# ever not integers? Do we lose something because of that?
By the very end I was able to figure out what this function is trying to do: compute and update the output shape of a crop layer. To solve the mystery, I dusted off my deductive reasoning skills and put them to work.
I read code nearby to see how and where this function is used. I had to search for similarly named variables in other code to give myself more context to what they mean. Ultimately, I had to comprehend way more than just this function to figure out these 12 lines of code. This experience is very common. To understand even a small part of the pie, you need to know how the whole pie was baked.
So, what does this code really do? This wonderful snippet in the Core ML documentation gives us a clue:
Aha! There are two functional modes with different behaviors for each mode. The _crop function is beginning to come into clearer view. Let’s rewrite the function in a way that is much more declarative about the world it lives in:
def _compute_crop_layer_output_shape(layer, shape_dict):
"""Update shape of output layer based on crop layer configuration.
There are two functional modes of the crop layer. When it has 1 input
blob, it crops the input blob based on the 4 parameters
[left, right, top, bottom]. When it has 2 input blobs, it crops the
first input blob based on the dimension of the second blob with an offset.
Args:
layer: Crop layer
shape_dict: dictionary of model layer shapes.
Returns:
Tuple containing output dimensions for Crop Layer.
"""
seq, batch, input_channel, input_height, input_width = (
shape_dict[layer.input[0]]
)
if len(layer.input) > 2:
raise Exception('Crop does not accept more than two inputs.')
if len(layer.input) == 2:
# When it has 2 input blobs, it crops the first input blob based
# on the dimension of the second blob.
second_input_shape = shape_dict[layer.input[1]]
output_height = second_input_shape[3]
output_width = second_input_shape[4]
return (seq, batch, input_channel, output_height, output_width)
crop_amounts = params.cropAmounts.borderAmounts
if not crop_amounts:
# If there are no border adjustments, return the original layer shape.
return shape_dict[layer.input[0]]
top, bottom = crop_amounts[0].startEdgeSize, crop_amounts[0].endEdgeSize
left, right = crop_amounts[1].startEdgeSize, crop_amounts[1].endEdgeSize
output_height = int(input_height - (top + bottom))
output_width = int(input_width - (left + right))
return (seq, batch, output_channel, output_height, output_width)
Reading this, hopefully the intent of the function is obvious. It’s definitely not perfect, and we can spend hours going back and forth on its construction; however, I would be willing to bet that you feel more comfortable explaining what this function does and maybe more empowered to make a change to this code.
No single letter variable names
While single letter variables are acceptable in some cases (indexes in an array for instance), their proliferation makes code extremely difficult to read. Using a single letter variable forces you to trace the code to its definition and make sure it wasn’t changed along the way.
Variable names should state what they represent, not how or why. Single letter names can not possible convey what they represent.
Be explicit rather than implicit
The first version implicitly states that there can be two modes of the crop layer. I had to put my deductive reasoning skills to work to figure it out. Making the details of the crop layer explicit in the code makes it clear what the function does and does not do. If you find yourself deep inside nested if-statements, ask yourself, “what conditions do I know about that the reader might not know?” By explicitly saying what is handled rather than what it is not handled, you will find code easier to understand.
“Functions should do one thing, and do one thing well” -Craig Lancaster -Me
My mentor, Craig, said this to me many many times and I think it cannot be repeated enough. If your function truly does one thing well, it should be possible to communicate what that one thing is in the function name.
The function name _compute_crop_layer_output_shape communicates the intent without hiding any surprises inside. If you have a difficult time succinctly describing the intent of a function, it’s probably time to split it up into two (or more) functions that each do one thing well.
The data science community is exploding right now. The popularity of AI and ML software is growing exponentially. Machine learning models are breaking into normal products. As machine learning becomes more mainstream, strong communication is key. Communicating code effectively to other engineers is a great place to start and will help the community grow even quicker.
Readable code makes bugs easier to spot. Readable code makes it easier for others to get involved. Readable code lets us spend our precious time solving interesting problems.
As the data science community grows and machine learning is used more to create fresh experiences, comprehensible code is necessary to increase the velocity of the community.
Comments 0 Responses