Monday, April 18, 2016

Wed, April 20 - LSDA

LSDA: Large Scale Detection Through Adaptation. Judy Hoffman, Sergio Guadarrama, Eric Tzeng, Ronghang Hu, Jeff Donahue, Ross Girshick, Trevor Darrell, Kate Saenko. 2014.
arXiv

Varun will spend some time discussion this paper first:
Rich feature hierarchies for accurate object detection and semantic segmentation. Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik. 2014.
arXiv

12 comments:

  1. summary:
    This paper presents a way of detection leveraging the learned CNN classification model and adapting those classifiers as detectors on contrary to the approach we've seen only using bounding boxes as training data and train from scratch. Authors remove the last layers in the alexNet with their object labels and fine-tune the model, then using regional proposal algorithms like selective search, they fine-tune the set B (set of objects with only the bounding box annotations) and also add background class to contributes to the final output. To adapt learned bounding box to set A( set of objects only learned through classification), they use KNN in fc8 to find the similar of objects between set B and set A.

    question:
    Given we have the faster rcnn with the regional proposal network(RPN), any of their recent works on adapting/leveraging classifier and the new detector architecture?

    ReplyDelete
  2. This paper presents a mechanism by which CNN-based image classifiers can be modified in such a way so that they can act as image detectors. This is accomplished by adopting an established existing, pretrained CNN architecture, ("AlexNet"), replacing the last layer with K linear classifiers (one for each desired category to be classified), which, when trained, produces a reasonable baseline classification performance. This CNN is then modified from a classification network to a detection network by following the R-CNN algorithm and available labelled detection data to collect region and background bounding box proposals which serves to fine tune the input to the CNN. Lastly, a category-specific transformation is learned that enables detections without region proposals.

    Questions/Discussion :
    1) Could you explain a bit why an accurate understanding of the background is necessary for a detector? By background do they just mean "not object x" where object x is what they are trying to detect?

    2) Could this work on a scene-trained net to provide scene-pertinent object detection (finding a bed in a bedroom)?

    ReplyDelete
  3. Abstract:

    The paper presents neural architecture for object detection learned for categories for which bounding box ground truth is not available. They use domain transfer for training detectors from classifiers. They remove the final layer of AlexNet and train K linear classifiers over each class using the known class bounding boxes. Then they use nearest neighbor to domain transfer the learned knowledge for remaining classes. Final performance shows improvement over baseline of pure classifier.

    Discussion:

    1) Are there any other semi-supervised learning approaches for this task?

    ReplyDelete
  4. Labels for image classification are cheaper to obtain than bounding boxes for detection. As a result the amount of classification labels are much greater than bounding boxes for recognition, making object detectors harder to train. The authors propose a possible solution, LSDA. LSDA allows for a net pretrained on image classification to be adapted into an object detector without requiring a large amount of bounding boxes. To do this, the final layer of the net (AlexNet was used) is randomized and resized to the amount of objects categories being detected, then finetuned for classification of the selected categories. Next, layers 1-7 are finetuned for detection of the categories with bounding boxes by using a method similar to R-CNN's finetuning method. For the output layer of the CNN, the categories that do not have bounding boxes take the average of the KNN categories with labels and use that as their finetuned output detection weight. The authors find that the method provides a significant performance boost over a baseline of using a net only trained for image classification.



    Question

    1) Has this approached been updated to use newer architectures? If so, how are the results?

    ReplyDelete
  5. These researchers want to combine models trained on just image tags with models trained on both image tags and segmented data to create a model that performs well on segmentation tasks without any segmentation training data. I.E. Transform an image classifier into an object detector. They tested their results on the ILSCVR2013 dataset. Their mAP for training without segmentation data is roughly half of what it is for training with segmentation data. They breakdown error modes into three categories: localization, confusion with background, and other. On classes with the most false positives, most of their errors are background errors.

    Discussion: Could you use a similar method to expedite training for a CNN where image net classifiers are available, but maybe not for a particular task you want to use them for? I.E. it seems like people are avoiding training networks from scratch, and if we could use something like this this to start networks at reasonable training points, we could eliminate the time required to train CNN’s.

    ReplyDelete
  6. In this paper authors present a method to do domain adaptation for classification based task to detection. They show a method of using a CNN trained for classification can be adapted to the task of detection using other finetuned categories. A nearest neighbor approach to find the change required for domain adaptation.

    Question: Is finetuning the classes doing most of the work.

    ReplyDelete
  7. Large Scale Detection through Adaptation (LSDA) attempts to turn an image classifier into an object detector. They use a pretrained CNN, AlexNet trained on ILSVRC2012, and modify it to discard the last weight layer and replace it with k linear classifiers, one for each category. In comparison with R-CNN object detector, the authros claim that LSDA algorithm requires bounding box labeled data for only a small subset of final detection categories. The also reduce the training time from 3 days to 5.5 hours by directly using the final score vector to produce the prediction scores.

    Discussion: How does this work compare to the fast R-CNN where they propose the RPNs which can reduce the training time by upto 50% when trained jointly with R-CNN?

    ReplyDelete
  8. The authors present Large Scale Detection through Adaptation (LSDA), an algorithm which learns the difference between the classification and object detection and uses the knowledge of classifiers trained on the image categorisation task to predict bounding boxes for the object detection task. The authors fine tune a CNN trained on the image classification task to detect bounding boxes for the various object categories using a technique similar to the one used for finetuning RCNNs.

    ReplyDelete
  9. Summary:
    The paper proposes an idea called LSDA of using a CNN (AlexNet as per the experiments with a network using 7 FCs) which is trained to perform strongly on object classification. They use very less data of object detector bounding boxes for the categories for fine-tuning the FC layers in the network so that the network learns the difference of classifier and detector and performs well on object detection as well.

    ReplyDelete
  10. This paper proposes Large Scale Detection through Adaptation (LSDA), which is an algorithm that learns to transform an image classifier into an object detector. To do this, they try to learn a transformation between a supervised classified and a supervised detector that is adaptable to other classifiers. Their approach is inspired primarily by R-CNN (Girshick et al.)

    Questions:
    Can the results of this detector somehow be used to more easily curate a dataset for a stronger supervised approach?

    ReplyDelete
  11. The paper presents LSDA, a method that attempts at transforming a classifier into a detector. This detector detects a large number of classes, and learns by extending the weights learnt on bounding boxes of certain categories, to similar categories for which bounding box data is not available. The authors use AlexNet, pretrained on ILSVRC 2012, and then replace the last weight layer with linear classifiers for each category. Detection scores is the difference between the class score and background score.

    Discussion:
    1) How exactly are these background bounding boxes obtained, and how many per image?

    2) How does this perform at detecting objects that would end up having a very low overlap with their potential bounding box?

    ReplyDelete
  12. Finished my Quals today :D

    Summary:
    The paper presents an approach that attempt to bridge the gap between classification and detection. To do so, they present a deep network architecture called LSDA that adapts a classifier into an object detector using domain transfer . This is a really important approach because it is extremely difficult to get good datasets that a re fully annotated with bounding boxes.

    1. it seems the next step would be segmentation, are there any approaches like this that not only localize the object but segment the object out of the image?
    2. How would this approach work o very small objects such as those in MS COCO?

    ReplyDelete