Machine learning (ML) began its ascent into the medical industry when it acquired the ability to detect visual patterns between images—a skill doctors and technicians take years to master.
Specifically, ML models for computer vision tasks in the medical field train on datasets of separated images to learn to recognize their similarities and differences.
In 2016, ML was used to classify T-lymphocytes against colon cancer epithelial cells with high accuracy (Chen, C. L. et al.). ML is expected to significantly accelerate the process of disease identification because of the success it’s already had when implemented in the cell classification project, as well as other previous attempts.
Malignant melanoma (skin cancer) present in skin growths often goes unnoticed beyond its earliest stage, often due to patient hesitation or errors in diagnosis made by healthcare providers — a problem our mobile app aims to solve. Especially with the COVID-19 pandemic, people are more hesitant to go to hospitals or clinics. An app one could download on their phone can ease anxiety about a concerning mole.
Essentially, we’re aiming to bring medical attention to people without them having to physically and immediately visit a doctor.
To incorporate machine learning into our app, we used Apple’s Create ML model-building framework to train a ML model to detect differences between malignant and benign moles, and then embedded that model into an iOS app.
We chose Create ML and Xcode because the accuracy of models generated on Create ML is higher than other frameworks. Currently, however, Create ML models can only be incorporated in Xcode-based apps.
Our Xcode App: Mole Scan
Our app, called Mole Scan, assesses a user-inputted image of a mole or abnormal skin growth and gives a benign or malignant classification. The app can (in theory) be used by a regular person to get a quick and anxiety-free identification of their mole without needing to consult a doctor.
This streamlining of the screening process allows for financial and emotional benefits for the user. The user would not need to pay for an unnecessary dermatologist visit unless they felt like the diagnosis given by the app is inaccurate or they get a malignant reading.
It would also save time for people who are busy or don’t have the resources to visit a dermatologist, potentially saving the user from emotional distress connected to waiting for and undergoing a doctor’s examination.
Currently, there is only one free app in the App Store that can assess and a mole and attempt to classify it as cancerous or otherwise; our model is able to outperform this lone competitor, with an accuracy rate of 98%. The accuracy rate is calculated by the amount of images the program correctly classifies out of a test set of 400 images and a validation set of 2,400 images. Our accuracy rate is high due to the large amount of images we trained our model on.
Our app (Mole Scan) is free and accurate so people of all backgrounds can access honest medical attention at whatever time suits them. Mole Scan is coming soon to the app store—January 2021.
Creating a Custom Dataset
In the next few sections, we’ll walk you through how to create an app like ours using Create ML and Xcode, as well as discuss our specific process for making the app.
When we create the model in Create ML, we need to be ready with a dataset for uploading into the model for it to train with. Before gathering images for the dataset, you need to make a decision here — what will be the categories or labels of the images — i.e. the classes the model will attempt to predict.
If you don’t have a plan for what images you want to use, or if you just want to make a sample project, here’s a list of available medical datasets:
Depending on how the dataset downloads, you may not need to do the following steps.
Fro demonstration purposes, we’ll be building a simple model that classifies an input image as one of two animals: a lion or a tiger. This will help us demonstrate the basic process by which we created our mole classification model. But before we jump to the demo animal classifier, I wanted to share a bit more info about what it was like to work with a big, custom dataset on a unique task.
For our model, we used 8,000 photos of benign and malignant moles. We started with 2,000 malignant images — from the ISIC dataset and a partnership with dermatologists — and 8,000 benign images. There was an inevitable disparity between the amount of benign and malignant images we could use, since there are 10 times more images of benign moles on the ISIC database.
The ISIC dataset is intended for doctors to learn from and provides the user with a plethora of skin growth images. It includes images from multiple datasets within it, such as the HAM1000 and SONIC skin cancer datasets. There are filters that can be turned on or off so that the dataset only shows certain types of skin cancer images. We used the filters to show malignant or benign images of moles and went through each image, selecting them based on the following criteria:
- Cannot be a microscopic image. Many images on the ISIC website are moles placed under a microscope, which is useless to us since a user will not input a microscopic image of their own mole using a phone camera. If the picture has a black ring around it, that indicates it’s a microscopic image. Some of the images have been cropped so that the black ring does not show, but it can still be a microscopic image if you can see tiny water droplets on the mole or the hairs are overly detailed.
- Has to be above 299 x 299 pixels. Create ML processes images greater than 299 x 299 pixels best. Results generated by the trained model may be negatively compromised if images are too small for the computer to notice and process.
- Cannot be a duplicate. Some images are duplicated in the dataset, especially in the malignant category. We need to make sure to only select one of the duplicates to avoid having the same image twice in our training set.
You may have noticed that there is a total 23,906 images in the dataset, as shown in the above screenshot, but only about 3,000 of those are malignant. Of those malignant images, about 10% are duplicates and more than a quarter are unusable because they don’t fit the above criteria. We got about 500 additional anonymous images for malignant moles from connections at dermatology departments.
We were able to produce 8,000 from the initial 2,000 malignant images by using data augmentation. Our augmentation process was simple — we rotated the initial 2,000 to the left three times, saving each set separately for a total of 8,000.
The benign class did not require augmentation since the ISIC dataset is saturated with large non-microscopic images of benign moles.
We used 6,600 images per class to train the model with, leaving us with 1,200 per class to use in validation and 400 for testing, for a total dataset of 16,000 images.
- Training: 6,600 benign — 6,600 malignant
- Validation: 1,200 benign — 1,200 malignant
- Testing: 200 benign — 200 malignant
The training accuracy rate for our model was 99%, the validation 97%, and the testing 98%.
Now that we’ve explored what our custom datasets looked like, let’s actually build a simple demo classification model with Create ML, so you can get a better sense of how to work with the framework. To do this, we’ll use a simple use case with lions vs tigers. Generally, the more photos you input as training data, the higher the accuracy of the model once it’s trained.
For differentiating between less complex images, use at least 20 per class. According to the Apple Developer website, 10 is sufficient, but the extra images per class make the model much more accurate.
The above screenshot shows how your folders should be arranged, with two separate folders for training and testing. The names of the folders are the labels for the images since the image classification template on Create ML does not require one to produce hand labelled images. Testing data will be used to check the accuracy of the model (i.e. to see how well it generalizes to unseen data) after it’s trained using the images in the training folder.
We used 20 images of lions and tigers each for the training data folder (since this is just a demo, accuracy does not matter as much to us, but it should for any legitimate project) and 5 in the testing folder. As a rule of thumb, at least 85% of your images should be in the train folder.
If you put training images in the test folder, the results will be flawed and give false positives with each run (which is unethical, to say the least, especially in medical applications).
How to Make A Create ML Model for Xcode
As of now, there’s unfortunately no simple way to run Xcode on Linux or Windows. If you do not have access to macOS to run Xcode, you can try to install it on your device.
Xcode will take anywhere from an hour to a day to fully download, and it takes up about 10 GB of storage on your machine. After downloading Xcode 11.6 (at the time of writing) from the app store, create an Xcode project by selecting the new project option. For our purposes, we selected a single-view app as our template, but you can choose any of the templates provided, depending on how you plan to use it.
We’re naming this new project Medium Test, but any name is fine since it does not impact the functionality of the app. Mole Scan was created using SwiftUI, but we’re using Storyboards for this tutorial.
Refer to the above image for direction. Be sure to make a new folder and place it somewhere it can be easily found—otherwise, it will be difficult later on when we create the folders of images for the ML model. Now that the Xcode project has been created, go to the Xcode menu in the top nav bar and “open developer tool”, where there should be a Create ML option.
Choosing the right Create ML template depends on what kind of project you’re working on:
- Image Classifier is for projects in which classes of objects are related but not the same (like moles that are benign and malignant; or cars that are Cadillac or Toyota), and you want to classify one object in the image. This is the template that we used to make our model.
- Object Detector is useful for projects where you’re trying to identify multiple, different objects in one image.
- The other templates and model types are not related to the purpose of our project, but can be really powerful for other use cases (i.e. working with audio, text, etc.).
Make sure to place this ML model file in the same Xcode folder you made earlier.
Once the model has been created, you can drag and drop your dataset folders of images into the software where it says training data and testing data, or choose the folders from your device. Be sure not to make these folders or put your images in a USB or external hard drive, because Create ML won’t be able to locate them.
Once you’ve inputted your data, your screen should look similar to ours below — and you can press the train button when you’re ready:
Click on the testing option to view the model statistics. Once the model is trained, you should get a statistics page which looks like this:
Don’t be worried if the precision rate is less than 100%, especially if the object in question has many complex characteristics, or you didn’t use thousands of images. Once the model has been generated, drag the little blue icon that says “MLModel” under the Output label onto your desktop. This model can be used in your Xcode project as you wish.
Creating a Basic Xcode App
To make a basic app interface that you can use with the trained model, you’re going to use two variables: one UIImageView to load photos from a gallery, and one UIButton for taking a photo with the camera.
First, you need to create an outlet for the UIImageView by control dragging the button into the assistant right under the class declaration. A yellow line should appear between the button and where you’re inserting it when you control drag. We selected the “weak” option for ours.
Make an action for the “take photo” button by control dragging into the assistant under the image view outlet. A little box should pop up that asks you to specify details about the action. Next to the connection, choose the action. You can name this variable anything, but make sure to stay consistent and use that same name when you access this variable later. Also ensure you’re editing the view controller in which your buttons are by checking above the coding space—it should have the name of a .swift file.
Here’s where we need to add our own code to gain functionality from the buttons and images. Create a variable called imagePicker and insert it above the viewDidLoad() function with the following line of code:
Then create a function called imagePickerController with a declaration that looks like this:
This code should be placed after the viewDidLoad() function. Inside of the take photo button function, place the following four lines of code, which determine the delegate of that action and the source.
In total, your code should look like this if you completed the above steps correctly:
We later created a choose photo option to go with the take photo option by creating another function called chooseImage() similar to how we made the takePhoto() function :
Once your code looks similar to ours above, you can save the photo taken by the user and run it through the model which was dragged in earlier by following the steps below.
Make model compatible with image view
Now that a model has been created and take picture/choose picture functions have been created, we need to connect the model with an image view so that the model will process a user-inputted image and give a classification for it.
First, we need to put a UITextView box into the storyboard view controller that has the UIImageView box on it. The text view box will be used by the model to convey the classification of what image it is processing from the image view.
Control drag this text box into the top of the class so that it makes a variable declaration:
@IBOutlet weak var textView: UITextView! //variable declaration
To connect the model with the uploaded image, you need to add the following detect() function to the end of the view controller code anywhere after the viewDidLoad() function.
Make sure to change the instance of mediumtestmodel_1().model to the name of the model you saved. In addition to the above code, you need to add this conditional statement at the bottom of the imagePickerController() function.
Add the following lines to the viewDidLoad() function after the super.viewDidLoad() line. This allows the program to know what to do after loading the view controller.
Here’s how the model should work when you run it after editing the code to make the model compatible with the image view:
Here’s the full code for incorporating the lion and tiger model into the app:
When differentiating between lions and tigers, the demo model finds patterns in the classes and compares the patterns to identify the images correctly. Our model Mole Scan creates the same types of patterns, except with images of malignant and benign moles. Because the mole images can look so similar, we used thousands of images for the model to train with. Lions and tigers look much more distinct, which is why the model didn’t require quite so many images.
The above screenshots and video show how our app interface works so far. It can save images of moles on different locations of the body and when users come back to the app they can select the dots on their body that signify the moles and see the pictures they took earlier. We hope to add more features and make the interface more user friendly. In the future, we hope to make the app more user friendly by adding more images and directions. We’re also actively working on a way for the user to contact their doctor or put an appointment in their calendar directly from the app, as well as export any photos and classifications from the app out to their camera roll.
And one final reminder: Mole Scan will be published in the App Store by January 2021!