If I gave you 30 minutes and a reasonably powerful computer, you could train a machine learning model to recognize dozens of breeds of dogs with higher accuracy than most humans.
This is partially due to great tools and frameworks like TensorFlow or PyTorch, but it’s also due to the availability of data. At some point, generous researchers decided to take the time to label a few thousand images of pets and release that data publicly.
Try to train a similar model for species of fish, and you’ll find yourself spending weeks or thousands of dollars sourcing and labeling data yourself.
Continue reading “Synthetic Data: A bridge over the data moat”