Tensorflow Object Detection(TFOD) API Setup

sunny savita Oct 09 2020 · 4 min read
Share this

In the computer vision field, three most common operations  which we perform i.e. image classification, object detection and image segmentation. In the computer vision field, people  usually confused with these three terms.Let’s start with understanding what is image classification,object detection and image segmentation.

Image Classification : image classification, a topic of pattern recognition in computer vision, is an approach of classification based on contextual information in images. "Contextual" means this approach is focusing on the relationship of the nearby pixels, which is also called neighbourhood.

  You will have instantly recognized it. It’s a dog or cat. Take a step back and analyze how you came to this conclusion. You were shown an image and you classified the class it belonged to (a dog, in this instance or a cat). And that, in a nutshell, is what Image Classification is all about.

Object Detection : Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class in digital images and videos. Well-researched domains of object detection include face detection and pedestrian detection.

As you saw, there’s only one object here: a dog. We can easily use image classification model and predict that there’s a dog in the given image. But what if we have both a cat and a dog in a single image? That’s where Image Localization comes into the picture. It helps us to identify the location of a single object in the given image. In case we have multiple objects present, we then rely on the concept of Object Detection. We can predict the location along with the class for each object using OD.

Image segmentation : We can divide or partition the image into various parts called segments. It’s not a great idea to process the entire image at the same time as there will be regions in the image which do not contain any information. By dividing the image into segments, we can make use of the important segments for processing the image. That, in a nutshell, is how Image Segmentation works.An image, as you must have known, is a collection or set of different pixels. We group together the pixels that have similar attributes using image segmentation

By applying Object Detection models, we will only be able to build a bounding box corresponding to each class in the image. But it will not tell anything about the shape of the object as the bounding boxes are either rectangular or square in shape.Image Segmentation models on the other hand will create a pixel-wise mask for each object in the image. This technique gives us a far more granular understanding of the object(s) in the image.

I hope you now have a clear understanding of what is Image Classification, Image Localization, Object Detection and Image Segmentation,now comes over TFOD API.

What is an API? Why do we need an API?

API stands for Application Programming Interface. An API provides developers a set of common operations so that they don’t have to write code from scratch.

TensorFlow Object Detection API :

The TensorFlow object detection API is the framework for creating a deep learning network that solves object detection problems.

There are already pretrained models in their framework which they refer to as Model Zoo. This includes a collection of pretrained models trained on the COCO dataset, the KITTI dataset, and the Open Images Dataset. These models can be used for inference if we are interested in categories only in this dataset.

How to setup the TFOD framework?

  Below is the step-by-step process to follow on local syatem for you to just visualize object detection easily with the help of TFOD.

STEP-1 Download the following content

  • Download the model : Download the faster_rcnn_inception_v2_coco_2018_01_28 model from the model zoo or any other model of your choice from TensorFlow 1 Detection Model Zoo.
  • Clone the tensorflow github repo: http://github.com/tensorflow/models/tree/v1.13.0 
  • Download utils file : Download Dataset & utils.  
  • Download the labelimg tool : Download labelImg tool for labeling images.  
  • before extraction, you should have the following compressed files in a single folder.

    STEP-2 Extract all the above zip files into a tfod folder and remove the compressed files-

    After extracting all the zip files now you should have the following folders -

    STEP-3 Creating virtual env using conda-

    Commands

    for specific python version : conda create -n your_env_name python=3.6

    for latest python version : conda activate your_env_name

    STEP-4 Install the following packages in your new environment-

    for GPU

    pip install pillow lxml Cython contextlib2 jupyter matplotlib pandas opencv-python tensorflow-gpu==1.14.0

    for CPU only

    pip install pillow lxml Cython contextlib2 jupyter matplotlib pandas opencv-python tensorflow==1.14.0

    STEP-5 Install protobuf using conda package manager-

    conda install -c anaconda protobuf

    STEP-6  Change protobuff to .py file-

    we convert protobuf file into python file becasue python compiler does not understand protobuf files. In this object detction we have written most of the file into a prtobuf file so we covert that into a python file.

    Open command prompt and cd to research folder.

    Now in the research folder run the following command

    For Linux or Mac

    protoc object_detection/protos/*.proto --python_out=.

    For Windows

    protoc object_detection/protos/*.proto --python_out=.

    STEP-7 Install setup.py for object detection-

    Install setup.py file which is available in your research folder.For this go over your anaconda prompt change your directory to research and run below command:

    python setup.py install

    STEP-8 varify your object detection model-

    To varify your object detection model you have to run .ipynb file which is reside in your models/research/object detection folder i.e. object_detection_tutorial.ipynb

    STEP-9 Paste all content present in utils into research folder-

    Following are the files and folder present in the utils folder-

    STEP-10 Paste  faster_rcnn_inception_v2_coco_2018_01_28 model or any other model downloaded from model zoo into research folder-

    Now cd to the research folder and run the following python file-

    python xml_to_csv.py

    STEP-11 Run the following to generate train and test records-

    from the research folder-

    python generate_tfrecord.py --csv_input=images/train_labels.csv --image_dir=images/train --output_path=train.record

    python generate_tfrecord.py --csv_input=images/test_labels.csv --image_dir=images/test --output_path=test.record

    STEP-12 Copy from research/object_detection/samples/config/ YOURMODEL.config file into research/training-

    The following config file shown here is with respect to faster_rcnn_inception_v2_coco_2018_01_28. So if you have downloaded it for any other model apart from faster_rcnn_inception_v2_coco_2018_01_28 you'll see config file with YOUR_MODEL_NAME as shown below-

    Hence always verify YOUR_MODEL_NAME before using the config file.

    STEP-13 Update num_classes, fine_tune_checkpoint ,and num_steps plus update input_path and label_map_path for both train_input_reader and eval_input_reader-

    Changes to be made in the config file are highlighted in yellow color. You must update the value of those keys in the config file.

    # SSDLite with Mobilenet v1 configuration for MSCOCO Dataset.
    # Users should configure the fine_tune_checkpoint field in the train config as
    # well as the label_map_path and input_path fields in the train_input_reader and
    # eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that
    # should be configured.
    
    model {
      ssd {
        num_classes: 6
        box_coder {
          faster_rcnn_box_coder {
            y_scale: 10.0
            x_scale: 10.0
            height_scale: 5.0
            width_scale: 5.0
          }
        }
        matcher {
          argmax_matcher {
            matched_threshold: 0.5
            unmatched_threshold: 0.5
            ignore_thresholds: false
            negatives_lower_than_unmatched: true
            force_match_for_each_row: true
          }
        }
        similarity_calculator {
          iou_similarity {
          }
        }
        anchor_generator {
          ssd_anchor_generator {
            num_layers: 6
            min_scale: 0.2
            max_scale: 0.95
            aspect_ratios: 1.0
            aspect_ratios: 2.0
            aspect_ratios: 0.5
            aspect_ratios: 3.0
            aspect_ratios: 0.3333
          }
        }
        image_resizer {
          fixed_shape_resizer {
            height: 300
            width: 300
          }
        }
        box_predictor {
          convolutional_box_predictor {
            min_depth: 0
            max_depth: 0
            num_layers_before_predictor: 0
            use_dropout: false
            dropout_keep_probability: 0.8
            kernel_size: 3
            use_depthwise: true
            box_code_size: 4
            apply_sigmoid_to_scores: false
            conv_hyperparams {
              activation: RELU_6,
              regularizer {
                l2_regularizer {
                  weight: 0.00004
                }
              }
              initializer {
                truncated_normal_initializer {
                  stddev: 0.03
                  mean: 0.0
                }
              }
              batch_norm {
                train: true,
                scale: true,
                center: true,
                decay: 0.9997,
                epsilon: 0.001,
              }
            }
          }
        }
        feature_extractor {
          type: 'ssd_mobilenet_v1'
          min_depth: 16
          depth_multiplier: 1.0
          use_depthwise: true
          conv_hyperparams {
            activation: RELU_6,
            regularizer {
              l2_regularizer {
                weight: 0.00004
              }
            }
            initializer {
              truncated_normal_initializer {
                stddev: 0.03
                mean: 0.0
              }
            }
            batch_norm {
              train: true,
              scale: true,
              center: true,
              decay: 0.9997,
              epsilon: 0.001,
            }
          }
        }
        loss {
          classification_loss {
            weighted_sigmoid {
            }
          }
          localization_loss {
            weighted_smooth_l1 {
            }
          }
          hard_example_miner {
            num_hard_examples: 3000
            iou_threshold: 0.99
            loss_type: CLASSIFICATION
            max_negatives_per_positive: 3
            min_negatives_per_image: 0
          }
          classification_weight: 1.0
          localization_weight: 1.0
        }
        normalize_loss_by_num_matches: true
        post_processing {
          batch_non_max_suppression {
            score_threshold: 1e-8
            iou_threshold: 0.6
            max_detections_per_class: 100
            max_total_detections: 100
          }
          score_converter: SIGMOID
        }
      }
    }
    
    train_config: {
      batch_size: 24
      optimizer {
        rms_prop_optimizer: {
          learning_rate: {
            exponential_decay_learning_rate {
              initial_learning_rate: 0.004
              decay_steps: 800720
              decay_factor: 0.95
            }
          }
          momentum_optimizer_value: 0.9
          decay: 0.9
          epsilon: 1.0
        }
      }
      fine_tune_checkpoint: "ssd_mobilenet_v1_coco_2018_01_28/model.ckpt"
      from_detection_checkpoint: true
      # Note: The below line limits the training process to 200K steps, which we
      # empirically found to be sufficient enough to train the pets dataset. This
      # effectively bypasses the learning rate schedule (the learning rate will
      # never decay). Remove the below line to train indefinitely.
      num_steps: 20000
      data_augmentation_options {
        random_horizontal_flip {
        }
      }
      data_augmentation_options {
        ssd_random_crop {
        }
      }
    }
    
    train_input_reader: {
      tf_record_input_reader {
        input_path: "train.record"
      }
      label_map_path: "training/labelmap.pbtxt"
    }
    
    eval_config: {
      num_examples: 8000
      # Note: The below line limits the evaluation process to 10 evaluations.
      # Remove the below line to evaluate indefinitely.
      max_evals: 10
    }
    
    eval_input_reader: {
      tf_record_input_reader {
        input_path: "test.record"
      }
      label_map_path: "training/labelmap.pbtxt"
      shuffle: false
      num_readers: 1
    }

    STEP-14 From research/object_detection/legacy/ copy train.py  to research folder-

    legacy folder contains train.py as shown below -

    STEP-15 Copy deployment and nets folder from research/slim into the research folder-

    slim folder contains the following folders -

    STEP-16 Now Run the following command from the research folder. This will start the training in your local system-

    copy the command and replace YOUR_MODEL.config with your own model's name for example faster_rcnn_inception_v2_coco_2018_01_28 and then run it in cmd prompt or terminal. And make sure you are in research folder.

    python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/YOUR_MODEL.config

    Note : Always run all the commands in the research folder.

    Comments
    Read next