Semantic and Instance Segmentation on iOS Using a Flask API — DeepLabV3+ and Mask R-CNN

Build an API that performs image segmentation and consume it with an iOS application

Computer Vision — iOS

Introduction

Let say you have an image, and you want to distinguish objects of interest— or in other words, find suitable local characteristics to distinguish them from other objects or from the background. This is called image segmentation or semantic segmentation.

When we segment a target object, we know which pixel belongs to which object. The image is divided into regions and the discontinuities serve as borders between the regions. One can also analyze the shape of objects using various morphological operators.

To put it simply, segmentation consists of dividing a given image into regions having homogeneity according to a predefined criterion (gray level, colors, edges, classification, etc.). The objective of segmentation is to establish a compact description representative of an image’s content.

This involves extracting visual clues relevant, sufficiently correlated with the entities that make up the scene from which the image is taken.

For example, we know the image above contains two distinct “objects” — a dog in green and a background in red. I can then isolate and extract a portion of the image (i.e. remove the background), or I can blur a portion of the background — (i.e. portrait mode).

In this article, I’ll create an API that will process an image and produce a pixel-level segmentation mask using instance and semantic segmentation. Then, I’ll create an iOS application that will consume the API.

Overview:

  1. Why segmentation?
  2. Create the Flask API
  3. Create the iOS application
  4. Handle the API callbacks in the iOS application
  5. Test and evaluate
  6. Conclusion

I have included code in this article where it’s most instructive. Full code and data can be found on my GitHub page. Let’s get started.

1. Why segmentation?

Let’s say you have a portrait image and you don’t particularly like the background, or you want to add something specific to the image—say some fun digital goggles, for example. In order to do so, you need to identify those regions and classify them, and segmentation does just that. Multiple techniques exist to tackle these kinds of problems, but the most common one these days is semantic segmentation.

Once you can identify the background region, you can do whatever you want with it—in this case, we’ll edit the portrait by changing the photo’s background 😅.

For more details on image segmentation’s applications, I wrote an article a few weeks back on Heartbeat:

2. Create the Flask API

Flask is a Python web application micro-framework built on the WSGI library of Werkzeug. Flask can be “micro”, but it’s ready for use in production for a variety of needs.

The “micro” in the micro-frame means that Flask aims to keep the kernel simple but expandable. Flask won’t make many decisions for you, like which database to use, but the decisions made are easy to change. Everything is yours, so Flask can be everything you need and nothing else.

I prefer to use a library called Flask-RESTful made by Twilio that encourages best practices when it comes to APIs.

Segmentation inference:

The segmentation API will use Ayoola Olafenwa’s newly published Python package. The package is pretty simple and straightforward, two types of segmentation are currently supported:

  • Semantic segmentation: Classify each and every pixel and assign it to a specific class of objects. Let’s say you have multiple cars in a given image—they will all be classified as the same object and will share the same colormap.
  • Instance segmentation: Contrary to semantic segmentation, all objects are treated separately, even if they are of the same class.

For more details, Ayoola Olafenwa wrote an excellent article on the matter.

Set up a Python environment:

Create a Python environment using your favorite method, and install the following packages:

You’ll also need to download the models with preloaded weights using bash commands. They’re all Keras models, so you should expect .h5 files:

Build the API:

Install Flask and Flask-RESTful packages:

We first need to create a set of constants, which will contain the model’s path, as well as the path where the images will be saved:

SEMANTIC_MODEL = "./models/deeplabv3_xception_tf_dim_ordering_tf_kernels.h5"
INSTANCE_MODEL = "./models/mask_rcnn_coco.h5"

INPUT_IMAGE = "./images/input.jpg"
OUTPUT_IMAGE = "./output_images/output.jpg"

Then we need to create an instance of the Flask app and create an Api object:

app = Flask(__name__)
api = Api(app)

Create our classes and POST methods:

Semantic Segmentation

class SemanticSegmentation(Resource):

  def post(self):
      if request.json:
          image = request.json['image']
          image_string = base64.b64decode(image)
          image_data = BytesIO(image_string)
          img = Image.open(image_data)
          img.save(INPUT_IMAGE)
          semantic_segment_image = semantic_segmentation()
          semantic_segment_image.load_pascalvoc_model(SEMANTIC_MODEL)
          semantic_segment_image.segmentAsPascalvoc(INPUT_IMAGE, output_image_name=OUTPUT_IMAGE)

          with open(OUTPUT_IMAGE, "rb") as img_file:
              my_string = base64.b64encode(img_file.read())
              final_base64_image_string = my_string.decode('utf-8')
          return {"output_image":  final_base64_image_string}
  • Create a class SemanticSegmentation
  • Create a method post
  • Parse the request (from the iOS application) and extract the base64 image string
  • Decode the base64 string and save the image in the directory using the native base64 Python module
  • Instantiate an object of type semantic_segmentation() using pixellib
  • Load the model
  • Perform segmentation and save the segmented image
  • Encode the image to a base64 format
  • Send a dictionary with a final base64 image string

Instance Segmentation

class InstanceSegmentation(Resource):

  def post(self):
      if request.json:
          image = request.json['image']
          image_string = base64.b64decode(image)
          image_data = BytesIO(image_string)
          img = Image.open(image_data)
          img.save(INPUT_IMAGE)
          instance_segment_image = instance_segmentation()
          instance_segment_image.load_model(INSTANCE_MODEL)
          instance_segment_image.segmentImage(INPUT_IMAGE, output_image_name=OUTPUT_IMAGE, show_bboxes=True)

          with open(OUTPUT_IMAGE, "rb") as img_file:
              my_string = base64.b64encode(img_file.read())
              final_base64_image_string = my_string.decode('utf-8')
          return {"output_image":  final_base64_image_string}
  • Create a class InstanceSegmentation
  • Create a method post
  • Parse the request (from the iOS application) and extract the base64 image string
  • Decode the base64 string and save the image in the directory using a native base64 Python module.
  • Instantiate an object of type instance_segmentation() using pixellib
  • Load the model
  • Perform segmentation and save the segmented image
  • Encode the image to a base64 format
  • Send a dictionary with final base64 image string

Run the Flask API

Finally, you need to add the classes to the Flask-RESTful resource and set the entry point for each class like so:

api.add_resource(SemanticSegmentation, '/semantic')
api.add_resource(InstanceSegmentation, '/instance')

Flask’s default port is 5000 , which means you can call POST methods using the following URL:

  • Semantic: http://127.0.0.1:5000/semantic
  • Instance: http://127.0.0.1:5000/instance

3. Create the iOS application

Create a new “Single View Application” and make sure you choose Storyboard as User Interface.

Now we have our project ready to go. I don’t like using storyboards myself, so the app in this tutorial is built programmatically, which means no buttons or switches to toggle — just pure code.

To follow this method, you’ll have to delete the main.storyboard file and set your SceneDelegate.swift file (Xcode 11 only).

With Xcode 11, you’ll have to change the Info.plist file like so:

You need to delete the “Storyboard Name” in the file, and that’s about it.

Change the SceneDelegate with the following code:

var window: UIWindow?

func scene(_ scene: UIScene, willConnectTo session: UISceneSession, options connectionOptions: UIScene.ConnectionOptions) {
    guard let windowScene = (scene as? UIWindowScene) else { return }
    window = UIWindow(frame: windowScene.coordinateSpace.bounds)
    window?.windowScene = windowScene
    window?.rootViewController = ViewController()
    window?.makeKeyAndVisible()
}

Create View Controllers

We need two ViewControllers:

  • ViewController():

This is where we’ll set our application entry point and set the buttons that lead to the appropriate segmentation.

  • OutputViewController():

This controller will be used to select the image in order to send it to the API and receive the API callback as well.

Setup ViewController():

Label

lazy var name: UILabel = {
   let text = UILabel()
    text.translatesAutoresizingMaskIntoConstraints = false
    text.font = UIFont(name: "Avenir-Heavy", size: 35)
    text.text = "Segmentation API"
    text.textColor = #colorLiteral(red: 0.4980392157, green: 0.05882352941, blue: 0.07843137255, alpha: 1)
    return text
}()
  • Instantiate a UILabel object
  • Set the font type (Avenir-heavy) and size
  • Set the label string—I chose to call the application “Segmentation API”, but that’s up to you to change it
  • Set the text color

Logo

lazy var logo: UIImageView = {
    let image = UIImageView(image: #imageLiteral(resourceName: "profile"))
    image.translatesAutoresizingMaskIntoConstraints = false
    return image
}()
  • Instantiate a UIImageView and pick the image using the “image literal” function
  • Enable auto layout

Buttons

ViewController() has two buttons, one for “Semantic segmentation” and the other one for “Instance segmentation”. I also created a custom Button called MyButton() to increase code reusability (available in the GitHub repository).

lazy var semanticBtn : MyButton = {
   let btn = MyButton()
    btn.translatesAutoresizingMaskIntoConstraints = false
    btn.addTarget(self, action: #selector(buttonToSemanticSegmentation(_:)), for: .touchUpInside)
    btn.setTitle("Semantic segmentation", for: .normal)
    let icon = UIImage(systemName: "map")?.resized(newSize: CGSize(width: 35, height: 35))
    let finalIcon = icon?.withTintColor(#colorLiteral(red: 0.5, green: 0.06049922854, blue: 0.07871029526, alpha: 1))
    btn.setImage(finalIcon, for: .normal)
    btn.imageEdgeInsets = UIEdgeInsets(top: 0, left: 15, bottom: 0, right: 100)
    btn.layoutIfNeeded()
    return btn
}()

Only the button title and target change, the rest is identical:

  • Instantiate a MyButton() object
  • Enable auto layout
  • Set the title string
  • Add an icon using SF Symbols
  • Add a target that will lead to OutputViewController()

Set up the layout

fileprivate func addElementsToSubview() {
    view.addSubview(name)
    view.addSubview(logo)
    view.addSubview(semanticBtn)
    view.addSubview(instanceBtn)
}

fileprivate func setupView() {
    
    logo.centerXAnchor.constraint(equalTo: self.view.centerXAnchor).isActive = true
    logo.topAnchor.constraint(equalTo: self.view.topAnchor, constant: 50).isActive = true
    logo.widthAnchor.constraint(equalToConstant: 200).isActive = true
    logo.heightAnchor.constraint(equalToConstant: 200).isActive = true
    
    instanceBtn.centerXAnchor.constraint(equalTo: view.centerXAnchor).isActive = true
    instanceBtn.widthAnchor.constraint(equalToConstant: view.frame.width - 40).isActive = true
    instanceBtn.heightAnchor.constraint(equalToConstant: 65).isActive = true
    instanceBtn.bottomAnchor.constraint(equalTo: semanticBtn.topAnchor, constant: -40).isActive = true
    
    semanticBtn.centerXAnchor.constraint(equalTo: view.centerXAnchor).isActive = true
    semanticBtn.widthAnchor.constraint(equalToConstant: view.frame.width - 40).isActive = true
    semanticBtn.heightAnchor.constraint(equalToConstant: 65).isActive = true
    semanticBtn.bottomAnchor.constraint(equalTo: view.bottomAnchor, constant: -120).isActive = true
    
    name.topAnchor.constraint(equalTo: view.topAnchor, constant: 250).isActive = true
    name.heightAnchor.constraint(equalToConstant: 100).isActive = true
    name.centerXAnchor.constraint(equalTo: view.centerXAnchor).isActive = true
    name.numberOfLines = 1
}
  • Add all the elements to the ViewController’s subview
  • Set up constraints for each element

Setup OutputViewController():

  1. Output Image
lazy var outputImage: UIImageView = {
    let image = UIImageView()
    image.translatesAutoresizingMaskIntoConstraints = false
    image.contentMode = .scaleAspectFit
    image.layer.masksToBounds = true
    return image
}()
  • Instantiate an empty UIImageView object
  • Enable auto layout
  • Select the content mode—I chose .scaleAspectFit
  • Enable masksToBounds in order to clip any layer bit outside the view boundaries

2. Upload and camera button

 lazy var cameraBtn : MyButton = {
   let btn = MyButton()
    btn.translatesAutoresizingMaskIntoConstraints = false
    btn.addTarget(self, action: #selector(buttonToCamera(_:)), for: .touchUpInside)
    btn.setTitle("Take an image      ", for: .normal)
    let icon = UIImage(systemName: "camera")?.resized(newSize: CGSize(width: 45, height: 35))
    let finalIcon = icon?.withTintColor(#colorLiteral(red: 0.5, green: 0.06049922854, blue: 0.07871029526, alpha: 1))
    btn.setImage(finalIcon, for: .normal)
    btn.imageEdgeInsets = UIEdgeInsets(top: 0, left: 15, bottom: 0, right: 100)
    btn.layoutIfNeeded()
    return btn
}()

The logic is the same as the buttons in ViewController() , the only important change is the target function.

3. Set up the target function for the buttons

The function will be triggered when we click the button. The function is pretty simple, but you’ll need to add a description in the info.plist file and explain to the user why you need to use the camera or access the media library (The application will crash if you don’t specify it).

@objc func buttonToUpload(_ sender: MyButton) {
    if UIImagePickerController.isSourceTypeAvailable(.photoLibrary) {
        let imagePicker = UIImagePickerController()
        imagePicker.delegate = self
        imagePicker.sourceType = .photoLibrary
        imagePicker.allowsEditing = true
        self.present(imagePicker, animated: true, completion: nil)
    }
}
  • Instantiate a UIImagePickerController()
  • Choose the source type—either the media library or the camera
  • Set the editing to true if you want to crop the image before sending it to the API
  • Present the UIImagePickerController() view

4. Setup the layout

Add the elements to the subview and set the constraints.

4. Handle the API callbacks in the iOS application

The API expects a dictionary of type [String: String]—the key being “image” and the value being the image in the format of a base64String .

I’m using the widely-used Swift package called Alamofire, which is excellent for handling HTTP networking with Swift. Install the package (I used pod) using your preferred method:

func imagePickerController(_ picker: UIImagePickerController, didFinishPickingMediaWithInfo info: [UIImagePickerController.InfoKey : Any]) {
    // your chosen image
    let pickedImage = info[UIImagePickerController.InfoKey.editedImage] as! UIImage
    self.outputImage.image = pickedImage.resized(newSize: CGSize(width: 350, height: 350))
    self.outputImage.showSpinner()
    
    // convert the UIImage to base64 encoding
    let imageDataBase64 = pickedImage.jpegData(compressionQuality: 0.2)!.base64EncodedString(options: .lineLength64Characters)
    
    let parameters: Parameters = ["image": imageDataBase64]
    
    AF.request(URL.init(string: self.apiEntryPoint)!, method: .post, parameters: parameters, encoding: JSONEncoding.default, headers: .none).responseJSON { (response) in
    print(response.result)
        
    switch response.result {
        case .success(let value):
                if let JSON = value as? [String: Any] {
                    let base64StringOutput = JSON["output_image"] as! String
                    let newImageData = Data(base64Encoded: base64StringOutput)
                    if let newImageData = newImageData {
                       let outputImage = UIImage(data: newImageData)
                        let finalOutputImage = outputImage!.resized(newSize: CGSize(width: 350, height: 350))
                        self.outputImage.removeSpinner()
                        self.outputImage.image = finalOutputImage
                    }
                }
            break
        case .failure(let error):
            print(error)
            break
        }
    }
    picker.dismiss(animated: true, completion: nil)
}
    

Everything is happening in the image picker delegate implementation. Many things are happening here, so let’s break it down:

  • Create an object of the picked image and parse it as a UIImage object
  • Convert the UIImage to a base64 encoding with a significant compression ratio of 20%. You don’t have to do it, but it does improve the performance of the API
  • Create the parameter that will be used to send the POST request value to be encoded into the URLRequest
  • Perform the request using the Alamofire request method. Pass the API entry point, the type of method ( POST in our case), and the parameters
  • Handle the API response result. If successful, we’ll parse the API response as a JSON object and convert the image from a base64 format to a UIImage object
  • Update the UIImageView with the segmented image

5. Test and evaluate the results

When preparing to send the image to the API, we need to encode it as a base64 object, but we also need to specify the compression ratio as well. This is an important element of the application—you need to evaluate and understand the limits of the API, and consequently the model’s limits, as well.

My small testing sample concludes that the compression ratio has little to no effect on the segmentation quality.

Considering that some of these images are not easy to segment (1 and 3)— mainly because you have a large variety of objects, shadows and lighting conditions—the model did a good job of grasping the objects, as well as creating most of the segmentation masks correctly.

Some accuracy issues appear when the compression rate is important—the figure above is a very complex image with multiple objects of the same class, too. The model did a great job of segmenting most of the objects with relatively good accuracy, but a significant portion of the image was not detected by the model. I even ran the inference a couple of times just to make sure it was a compression issue.

These issues are common in computer vision—you have to control your environment in order to get the best “expected” result, the word “expected” being very important. It can be easy to compromise accuracy for the sake of speed or latency. You have to make those choices and decide what suits your use case best.

As for semantic segmentation, the model did just ok for pretty much any tested level of compression. This is purely a limitation of the model. By comparison to instance segmentation, compression didn’t have a noticeable effect on the accuracy.

6. Conclusion

The library is great for a fast and easy segmentation, but it does lack very important elements:

  • Integrate custom models
  • Train and evaluate custom models
  • The inference time for both supported models is not optimal (to say the least)
  • In order to start segmenting, you have to manually download the models and add them to your project structure

Overall, the library is at its genesis, and Ayoola Olafenwa did a great job at making it easy to get going and start segmenting. I’m hoping she will continue to improve the library by continuing to add new features like the ones mentioned above. In my opinion, custom models (training and inference) should be a priority.

Thank you for reading this article. If you have any questions, don’t hesitate to send me an email at [email protected].

Avatar photo

Fritz

Our team has been at the forefront of Artificial Intelligence and Machine Learning research for more than 15 years and we're using our collective intelligence to help others learn, understand and grow using these new technologies in ethical and sustainable ways.

Comments 0 Responses

Leave a Reply

Your email address will not be published. Required fields are marked *