Fast and Accurate Feature-based Region Identification : Current School News

Fast and Accurate Feature-based Region Identification

HURRY NOW! APPLY NOW FOR USA JOBS WITH FREE VISA!


 

Fast and Accurate Feature-based Region Identification.

ABSTRACT  

There have been several improvements in object detection and semantic segmentation results in recent years. Baseline systems that drives these advances are Fast/Faster R-CNN, Fully Convolutional Network and recently Mask R-CNN and its variant that has a weight transfer function. Mask R-CNN is the state-of-art.

This research extends the application of the state-of-art in object detection and semantic segmentation in drone-based datasets. Existing drone datasets was used to learn semantic segmentation on drone images using Mask R-CNN. 

TABLE OF CONTENTS

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Theoretical aspects of Image classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10
2.1 Computer Vision Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Image Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Object Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.3 Semantic Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.4 Instance Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 CNN for Object Detection and Segmentation . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 Convolutional Neural Networks (CNN) . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Mask R-CNN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Drone-Based Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Used Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
3.1 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2 TensorFlow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Keras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Scikit-Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Result and Analysis-Implementation 28
4.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Mask R-CNN library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4.2.1 Config.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2.2 Model.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.3 Utils.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.2.4 Drone.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2.5 Drone-detect.py . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46
A Principal program codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47

INTRODUCTION  

Images and videos are collected everyday by different sources. Recognizing objects, segmenting localizing and classifying them has been a major area of interest in computer vision. Significant progress has been made commencing from use of low-level image features, such asscale invariant feature transform SIFT [Lowe, 2004] and histogram of oriented gradients HOG [Dalal and Triggs, 2005], in sophisticated machine learning frameworks to the use of multi-layer convolutional networks to compute highly discriminative, and invariant features [Girshick et al., 2015].

SIFT and HOG are feature descriptor and semi-local orientation histograms that counts occurrences of gradient orientation in localized portions of an image. Just as Convolutional Neural Network (CNN) is traced to the Fukushima’s “neocognitron” [Krizhevsky et al., 2012], a hierarchical and shift-invariant model for pattern recognition, the use of CNN for region-based identification (R-CNN)[Girshick et al., 2015] can also be traced back to the same.

After CNN was considered inelegant in the 1990s due to the rise of support vector machine (SVM), in 2012 it was revitalized by [Krizhevsky et al., 2012] by demonstrating a valuable improvement in image classification accuracy on the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [Deng et al., 2012] and included new mechanisms to CNN like rectified linear unit (ReLU) and, dropout regularization. To perform object detection with CNN and in attempt to bridge the gap between image segmentation and object detection two issues were fixed by [Girshick et al., 2015].

First was the localization of objects with a Deep Network and training a high-capacity model with only a small quantity of annotated detection data. Use of a sliding-window detector was proposed for the localization of object but was not preferred because it can only work for one object detection and all object in an image has to have a common aspect ratio for its use in multiple object detection. Instead the localization problem was solved by operating within the “recognition using regions” paradigm. Fast R-CNN was introduced in 2015 by Girshick [Ross, 2015].

BIBLIOGRAPHY

N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. 2005. URL
https://lear.inrialpes.fr/people/triggs/pubs/Dalal-cvpr05.pdf.

J. Deng, A. Berg, S. Satheesh, H. Su, A. Khosla, and L. Fei-Fei. ImageNet Large Scale Visual
Recognition Competition 2012 (ILSVRC2012). 2012. URL http://www.image-net.org/
challenges/LSVRC/2012.

R. Girshick, J. Donahue, T. Darrell, and J. Malik. Region based Convolutional Networks for Accurate Object Detection and Segmentation. 2015. URL http://islab.ulsan.ac.kr/files/
announcement/513/rcnn_pami.pdf.

R. B. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object
detection and semantic segmentation. CoRR, abs/1311.2524, 2013. URL http://arxiv.org/
abs/1311.2524.

X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks.
In Y. W. Teh and M. Titterington, editors, Proceedings of the Thirteenth International Conference
on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research,
pages 249–256, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR. URL http://
proceedings.mlr.press/v9/glorot10a.html.

K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks
for visual recognition. CoRR, abs/1406.4729, 2014. URL http://arxiv.org/abs/1406.
4729.

K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR,
abs/1512.03385, 2015. URL http://arxiv.org/abs/1512.03385.

K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. CoRR, abs/1502.01852, 2015. URL http://arxiv.org/
abs/1502.01852.

K. He, G. Gkioxari, P. Dollár, and R. B. Girshick. Mask R-CNN. CoRR, abs/1703.06870, 2017.
URL http://arxiv.org/abs/1703.06870.

S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing
internal covariate shift. CoRR, abs/1502.03167, 2015. URL http://arxiv.org/abs/1502.
03167.

CSN Team.

    Hey You!

    Join Over 5 Million Subscribers Today!


    => FOLLOW US ON INSTAGRAM | FACEBOOK & TWITTER FOR LATEST UPDATE

    Tags: , , ,

    Comments are closed.

    %d bloggers like this: