Advanced Computer Vision: Mon, Feb 29th

Friday, February 26, 2016

Mon, Feb 29th - Selective Search

Selective Search for Object Recognition. J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, A. W. M. Smeulders. IJCV 2013.

project page

20 comments:

sfenu3February 28, 2016 at 6:23 PM
In image classification tasks, having initial ideas about the locations of potential objects tends to increase classification accuracy. This paper present a way to selectively search images for potential object locations by using a variety of segmentation techniques to come up with potential object hypotheses. Notably, it has better recall on Pascal VOC 2012 than the objectness measure presented by Alexe et al.

Questions:

1) The authors vary color space of the images and the parameters of their segmentation algorithms to come up with a number of object hypotheses. Could additional or more accurate object hypotheses be learned by generating segmentations in gradient space/feature space rather than just looking in color space?
ReplyDelete
Replies
CJDSFebruary 28, 2016 at 8:48 PM
This paper presents a way to search selectively through images for potential object locations. They do so by finding regions of interest and matching similar regions. They then group the similar regions as until the whole image is grouped. They use a series of colour spaces with different invariance properties as diversifiction strategies to detect good objects. They measure similarity on colour texture and how well regions fit.

Questions:
1. Could the algorithm be adapted to generate more complex selections than just boxes. As in an a polygon as opposed to an object. I think it might be able to create better selections
ReplyDelete
Replies
John TurnerFebruary 28, 2016 at 9:08 PM
This paper introduces an alternative method to exhaustive search for finding potential object locations, a selective search mechanism which uses segmentation to generate class independent object location hypotheses. Using a hierarchical grouping strategy to capture various scales of potential object locations, along with experimentally diversifying the color spaces, similarity measures used and starting search locations, they are able to build hypotheses of object locations, ideally arranged such that the most likely locations come first. Due to the computational efficiency of their methodology they are able to use bag-of-words features for object recognition instead of HOG features, which were considered the only computationally feasible choice at the time given an exhaustive, uninformed search space.

Discussion/Questions
1) Would this kind of tailored preprocessing of images for object recognition be useful with a CN to perhaps decrease the network's footprint by providing cues for configuration? Or possibly cues for actively turning off parts of the network relating to specific sections of an image to help during training
ReplyDelete
Replies
UnknownFebruary 28, 2016 at 9:18 PM
Abstract:

This paper presents a selective search method for object localization and recognition. The method can be placed as in-between strategy of exhaustive search for object locations and segmentation. Selective search provides a computational efficient way for getting multiple object location hypothesis. The approach uses initial region by previous research methods and do a hierarchical merging of the regions until the whole image is covered. They also diversify their approach by using multiple color spaces and similarity measures. They show that their final results were quite good on object localization (box-based, region-based) and object recognition as compared to the state of arts system available then.

Discussion:

1) Can you please explain the training algorithm in figure 3?
ReplyDelete
Replies
enlite traderFebruary 28, 2016 at 9:42 PM
Summary:
This paper introduced selective search strategy by using :
1. hierarchical grouping iteratively combining most similar regions based on a compound similarity metric until image only has single region.
2. diversified strategies that complement each other: different color space is used; similarity between region is defined based on color, texture, size and fit;different starting region.
3. Ordering object location proposal with randomness by calculate the position value with randomized function on given position.

Question:
I'm still not very clear on how the author trade-off between the quality and speed in selective search, what are the key technique in selective search to speedup the search that are not in the previous work?
ReplyDelete
Replies
JonathanFebruary 28, 2016 at 9:55 PM
This paper seeks to propose object locations without using exhaustive search. They use selective search as their sampling technique. They use complementary grouping strategies to diversify the groupings for proposed object locations.

Q.
1. Figure 2 is showing what?
2. What colour space did they use? Section 3.2 says they use a single colour space throughout the algorithm. What colour space is that?
3. It would be interesting to pre-process the data using something like the authors work and feed it into a deep network. If for no other reason than to see what is and isn't important.

4. It would nice to get a primer on [13], Efficient Graph-Based Image Segmentation, because this paper is used a lot in the paper.
ReplyDelete
Replies
Sam SeifertFebruary 28, 2016 at 9:58 PM
This¬ paper covers search for object locations for use in object recognition. They combine a bunch of algorithms, rather than just using a single approach. They combine in a hierarchical structure. They order the combined object hypotheses based on the order the hypotheses were generated in each individual grouping strategy. That way object can get the most likely image descriptors before the less likely ones. The do recursive supervised learning with an SVM classifier to narrow down the hypothesis set. The structured ordering provides a speed – quality tradeoff relationship. Their algorithm was 13 – 60 times faster that state of the art when published. Using their objection locator they can improve the accuracy of recognition algorithms.
Discussion:
Have people tried using CNN’s to identify object boundaries? Something like: last layer of network would be a binary layer with identical size as the image that says whether or not that pixel was a boundary between objects? That would provide the top down hierarchical structure they talk about in this paper. The network could deduce objects in internal structure, and decided to pair sweater with head even though textures and colors missmatch! All hypothetical of course.
ReplyDelete
Replies
UnknownFebruary 29, 2016 at 12:10 AM
This paper introduces us to Selective Search which combines both Exhaustive search and Segmentation to give bounding boxes or regions for potential objects in an image. To attain exhaustive search they apply bottom-up grouping using region based features. A segment in an image can be defined in multiple ways, different color, texture or just grouping of smaller segments. Hence the authors tend to diversify the segmentation by running the algo on various color spaces and similarity measures. The main goal of Selective search is to get faster object proposals and hence they train an SVM with fast approximate classification strategy. For evaluation a measure of overlap between resulting bounding box and ground truth is defined (called MABO). MABO is used to evaluate various strategies proposed in this paper, like, hierarchical grouping, individual and combinations of diversification strategies, Box-based and Region-based locations. This is then compared to other object search algorithms and Selective search's performance is similar or greater. Evaluation of an object proposal technique is not complete without performing object recognition on the results. They use PASCAL VOC 2010/2012 for training and testing. Most of the results are better than other methods, especially for the objects which are not rigid.

Question: Are the object proposals ranked in the end?
ReplyDelete
Replies
UnknownFebruary 29, 2016 at 12:21 AM
This paper proposes a data driven approach for uncovering object locations in an image called Selective Search, using diverse sampling criteria to merge regions together in different representations. The authors give 4 criteria for evaluating the similarity of regions so that a hierarchical bottom-up image segmentation can be performed. They show the results of their approach based on empirical results on the PASCAL VOC dataset.

Discussion:
1. How do the authors select the various diversification strategies? The combination of all the different strategies could be exponential, so how do you decide which ones work better?

2. How would different sizes of the codebook affect results?
ReplyDelete
Replies
UnknownFebruary 29, 2016 at 3:48 AM
This paper proposes a method of finding object location before object classification task. Their algorithm Selective Search is based on combination of techniques of exhaustive search which looks for the object in entire space but becomes computationally expensive and on segmentation. They utilize different diversification strategies of color spaces, similarity measures and starting points to deal with different image conditions. They primarily report a high recall rate of their algorithm though the error rate was large.

1) Would other option than hierarchical grouping improve results.
ReplyDelete
Replies
UnknownFebruary 29, 2016 at 4:52 AM
In this paper the authors introduce the idea of selective search which greatly improves over the notion of exhaustive search to find matches for object locations in an image. They use the concept ofhierarchical grouping where they merge similar regions in an image till they cover the entire image. This makes sure that they can generate locaitons at all scales. To diversify their selective search they make use of a variety of color spaces with different invariance properties, use different similarity measures and also vary their starting regions. They then test their selective search strategy for object recognition using the Pascal VOC 2010 detection task.

Discussion:
Are there any techniques that can be used to add more spatial knowledge about the specific layout of the object so that the feature descriptors from this paper can perform bettert than HOG on rigid bodies?
ReplyDelete
Replies
UnknownFebruary 29, 2016 at 6:42 AM
The paper presents a new method for selecting image locations used to search for an object. The potential object locations created by the algorithm are not tied to any particular object category. To search for possible locations, the authors for start with regions created by previous works. Regions are combined using a similarity metric until the entire image is a single region. By varying the color space, region similarity metric and k threshold for region grouping they create a collection of possible object locations. Because this method is so fast, the authors were able to use the more computationally slow bag-of-words model when identifying the proposed object locations.

Question:

Is there anyway we can use selective search to improve the training process of a deep convolutional network?
ReplyDelete
Replies
UnknownFebruary 29, 2016 at 8:29 AM
This paper presents a method of object localization that reduces the number of possible locations. The authors use hierarchical grouping to find potential locations and diversify the color space, similarity measures and starting regions. With this, they combine object location hypotheses with some randomness to reduce the bias towards large regions. The authors' method computationally allowed to use slower bag-of-words features for object recognition instead of HOG.

What other diversification strategies could be used to improve the result?
ReplyDelete
Replies
anushaFebruary 29, 2016 at 8:29 AM
The paper introduces a method to adapt segmentation for selective search. The key idea is to use a diverse set of complimentary hierarchical grouping strategies which in turn makes the search stable, robust and independent of the object-class. The proposed algorithm uses different diversification strategies of color spaces, similarity measures and starting points in order to tackle different image conditions. And hierarchical grouping is used to ensure locations are generated for different scales. The papers also demonstrates that selective search could be used to create a good Bag-of-Words based localisation and recognition system.

Question:
How does the threshold parameter k decide the starting positions?
ReplyDelete
Replies
Aditi GuptaFebruary 29, 2016 at 8:32 AM
The paper presents a novel method for object recognition that combines the advantages of segmentation based approach and exhaustive search. Some of the key features of the proposed algorithm are:
Hierarchical grouping.The paper uses an algorithm to iteratively group similar regions and hence detect objects at different scales.
Diversification strategies: The paper experiments with different colour spaces, similarity measures and starting regions to come up with the best set strategies for high accuracy and speed.
SIFT and the Bag of Words. The paper uses SIFT features and the Bag of words model to encode the object windows and then uses an SVM classifier to classify them.

Discussion:
While comparing between multiple flat partitioning and hierarchical partitioning, the authors do not speak about the computation time. Can we assume that the partition times are comparable?
I am not clear what the threshold k indicates. I understand that is it is a parameter used in a cited a paper “ Efficient Graph based Image Segmentation” and is a measure of the scale in some sense. But I’d like more clarity on it, since it is extensively used in the evaluation process.
The algorithm ultimately predicts bounding boxes that contain objects. How does this work in situations where we want to search for concepts such as “grass”?
ReplyDelete
Replies
UnknownFebruary 29, 2016 at 10:15 AM
This paper describes various approaches for efficient and powerful object recognition by developing a selective search hierarchical model that takes into account both the properties of using segmentation techniques that delineate the structural properties and exhaustive search to capture all possible locations within the image. Various diversification strategies have been explained that obtain the object locations which are then fed to the detector stage. Object detection part consists of various forms of SIFT being used. Different evaluation strategies have been discussed. Results on Pascal VOC and ImageNet along with the computation time have been discussed.

Questions-
1. Why per pixel based SIFT is used? Is it in order to obtain dense metrics similar to segmentation?
2. An intuition behind the thresholding parameter k is required.
ReplyDelete
Replies
UnknownFebruary 29, 2016 at 10:15 AM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownFebruary 29, 2016 at 10:35 AM
While segmentation methods have been extensively used for object classification tasks, they usually rely on exhaustive search over entire image space, which is computationally expensive and slow. This paper proposes new approach “selective search” for generating objects proposals for classification problem. “Selective search” is inspired by bottom-up segmentation, because images are intrinsically hierarchal, selective search uses the image structure to sample class-independent object regions. Different complementary cues are explored (color, texture, and size). This approach reduces the search space and time, and it also reduces the computational power, enabling the use of stronger descriptors such as bog-of-words.

In Figure 5, are these the only object regions the method proposed? I’m interested to see how well it preforms in other database such as MS CoCo where the scenes are crowded and a lot of objects are occluded.
ReplyDelete
Replies
Vasavi GajarlaFebruary 29, 2016 at 10:40 AM
Summary:
This paper aims to achieve object recognition by using a method called data-driven “Selective Search” which is a combination of variations of exhaustive search and object segmentation. In order to combat the time complexity that exhaustive search demands yet include as many image conditions as possible, the authors use diversification techniques like varying the color spaces, locations, grouping criteria. For segmentation, they use hierarchical segmentation to make sure that most of the objects are accounted for.
Questions:
Are there other diversification methods which can be incorporated that might result in improvement of results and also take the technique closer to an exhaustive search scenario?
ReplyDelete
Replies
UnknownMarch 1, 2016 at 12:50 AM
Was at CSCW today

Summary:

This paper attempts to address the project of object location proposals using selective search. This method uses button-up group of segmented regions within an image to create a heirchical grouping algorithm (the grouping method utilizes a number of diversification strategies to overcome pitfalls with only examining texture or color). The object hypotheses are then narrowed down by an iteratively trained SVM model. The results are quite impressive!

Questions:
1. Object proposals seems like a nice case where context should improve the results. It would be interesting to see how using where the proposals are in the image in relation to each other can affect the results. For example, you would expect a straw in a cup. Perhaps if you have proposals that are ‘cup’-like you should expect a ‘straw’-like object?

2. Can we use deep learning to generate bounding box proposal regions?
ReplyDelete
Replies