Advanced Computer Vision: Fri, April 8 - Dense Semantic Correspondence

Wednesday, April 6, 2016

Fri, April 8 - Dense Semantic Correspondence

Dense Semantic Correspondence Where Every Pixel is a Classifier. Hilton Bristow, Jack Valmadre, Simon Lucey. ICCV 2015.

arXiv

17 comments:

John TurnerApril 7, 2016 at 9:48 PM
This paper proposes a mechanism for building semantic correspondences between two images with thematic similarities but with different geometry and dissimilar appearances. This is accomplished by learning a linear classifier on every pixel in a source image and then applying this classifier in a sliding window across a target image to provide match likelihoods. The normal, obvious shortcomings of such a method (such as the potential intractable task of anaylizing the negative set of possible samples) are alleviated using linear discriminant analysis (which summarizes the negative set as a mean and covariance). This is facilitated through collecting hard negative statistics from other images during a pretraining phase to build a covariance matrix which can be used to create a discriminative detector through a single matrix-vector calculation per pixel via the application of bayes rule.

Questions/Discussion :
1) What kind of conditions are necessary for a negative set to be summarizable purely by its mean/covariance (i.e. that it is gaussian in nature)? In other words, what kind of situations would LDA not perform sufficiently well to enable this method to work?
ReplyDelete
Replies
enlite traderApril 7, 2016 at 9:49 PM
summary:
This paper presents using LDA with Toeplitz covariance matrix to match features in the pair image compared to the previous SVM with hard negative mining. The key observation of assuming Toeplitz covariance matrix is from distribution of natural images is shift invariant.The features are 5 by 5 dense SIFT patches with 128 channels and Toeplitz covariance matrix are estimated from subest in the ImageNet and they applied the same regulariser and algorithm as SIFT Flow.

question:
In the testing images shows in the paper, there is clear salient object(s) and background. How is this method applies to images with no apparent salient objects but with a lot of similar objects?
ReplyDelete
Replies
UnknownApril 7, 2016 at 10:18 PM
Abstract:

The paper presents a fast algorithm for finding semantic correspondence between two images. They do it by training a pixel wise classifier for each pixel in the image. They use linear discriminant analysis instead of SVM because the output of LDA is comparable across classifiers. Their final results look better than the comparable siftflow method.

Discussion:

They mention that they use graphical method based framework for global alignment. Could you please explain that part?
ReplyDelete
Replies
UnknownApril 7, 2016 at 11:02 PM
The authors propose an exemplar classifier based method to perform dense semantic correspondence across different images that may have no spatial or temporal relationship (properties which are used in Stereo and Optical Flow).
In this paper, each SIFT keypoint is used to learn a classifier which then calculates the match likelihood of pixels in the target image in the feature space. The key challenge is to train the large number of classifiers, since dense SIFT sampling is performed. The authors use Linear Discriminant Analysis (LDA) to learn these classifiers using 2 important properties of the negative set which allows them to estimate the large number of classifiers efficiently. The authors then generate a hand annotated dataset to evaluate their method in comparison to state of the art techniques.

Discussion:
While the LDA framework is quite robust and mathematically elegant, does the dependency on the feature space have any effects on the technique? There are other keypoint sampling techniques such as SURF (though not as popular as SIFT) and I am curious to see how these perform in the same framework even though the authors claim no assumptions about the feature space.
ReplyDelete
Replies
JonathanApril 8, 2016 at 12:13 AM
This paper attempts to build semantic correspondences between images with certain similarities but with different geometry and different visual appearance similarities. A linear classfier is learnt for every pixel in a source image. The linear classifier is then applied via a sliding window across a target image to determine if the correspondence exists or not.

Q:
1. w_A(i) is the linear classifier trained to predict correspondences to pixel i in I_A.
a) First, correspondences from what to pixel i?
b) what is the output of w_A(i)?
c) What is g in eqn 7?
ReplyDelete
Replies
UnknownApril 8, 2016 at 12:34 AM
This paper proposes to solve the correspondence problem especially between images of different geometry and appearance but both images being of the same object. The authors treat the semantic correspondence problem as constrained detection problem which they solve with exemplar LDA. The authors mention ways of overcoming of traditional correspondence problem of learning geometric dependencies, local descriptors and computational complexity and advantages of using LDA over SVM. They compare their results with other algorithms and ground truth obtained from human annotators.

Discuss -
1) Could you explain in easier way what are the complicating factor for correspondence in SIFT flow.
2) Could you explain about the formulation of objective function, inverse fitting problem.
ReplyDelete
Replies
CJDSApril 8, 2016 at 1:06 AM
This paper proposes a classifier that builds a correspondence between 2 images with different geometry (of the same objects). The parameters are learned using LDA.

Questions
1. Why does LDA (Gaussian distributoons with equal covaeiance) allow us to apply Bayes rule
ReplyDelete
Replies
Sam SeifertApril 8, 2016 at 1:51 AM
Semantic correspondence involves finding similar points in images relative to semantic meaning (persons elbow, elephants tusk, etc). It's a difficult task to due high variance in content (all elephants look different) projection (the same elephant looks very different when shot from different angles) and occlusion. This paper proposes using an LDA classifier over a traditional SVM classifier, because SVM has difficulty representing all incorrect correspondences. They can even perform an LDA trick and just use the covariance matrix for negative examples as the covariance matrix for all samples for each label (if the number of positive instances is small, matrix won’t really change too much hopefully). They do some math. They show that LDA does a better job of ignoring background and picking interesting points on the subject.

Discussion:
I feel like humans choose features in the center of the faces (both human and elephant) because they are lines of symmetry. For example, in figure 5, there are 5 features on the gentleman wearing a suit’s nose. That region is very uniform. But that region is interesting because it is between two nose ridges. How is that learned in their model? Regions with no visually distinguishing features that happen to be between two more visually interesting regions.

ReplyDelete
Replies
Aditi GuptaApril 8, 2016 at 1:59 AM
The paper proposes a method to establish pixel wise semantic correspondence between objects and scenes belonging to the same category. The authors train an exemplar classifier for every pixel in the train images and apply it to a sliding window in the testing set to find the most likely matches. The authors use the LDA classifier to train their model due to the fact that that it takes less time to train and returns the posterior probability of the covariance matrix.

Question:
Could you please explain Equation 2 in the paper. What does i and xi indicate?
ReplyDelete
Replies
UnknownApril 8, 2016 at 10:47 AM
This paper presents a novel approach for pixel to pixel matching that takes into account the semantic meaning of the region. The approach works by training a pixel-wise LDA classifier, and the results work better than the previous state of the art: SIFT flow. This approach tries to overcome a shortcoming with previous SIFT methods where points match to other points that look visually similar rather than to points that make semantic sense. The ideal output should instead match ear points to other ear points regardless of the variation in appearance within an object class.

1. how would this method work across image classes? Can it match points on a lion to a house-cat?
2. Could this be adapted for video?
ReplyDelete
Replies
UnknownApril 8, 2016 at 11:10 AM
The paper presents an algorithm for finding semantic correspondences across objects and scenes. For each pixel in a source image, an exmplar LDA classifier is trained. The classifiers are applied to the pixels in the target image. Using LDA over SVM allows for fast learning of a large amount of classifiers, and estimates of match confidence that are comparable across all pixels. Sparse keypoint localization results show that the algorithm's performance is better than SIFT Flow.

Questions:
What about the bike image pair makes it hard to produce correspondences?

Can you explain equation 7?
ReplyDelete
Replies
UnknownApril 8, 2016 at 11:26 AM
Obtaining semantic correspondences across images of objects and scenes is the primary objective of this paper. Semantic correspondences capture the inherent visual similarity within a class of objects also resulting in appearance and geometric based invariance. Here a discriminative detector is learnt for every pixel within the image. Pixel level classification is carried out by thousands of lda detectors using sliding window approach. LDA is the chosen detector for this purpose. Here they advocate a unary function based on exemplar method, where each training image is learnt on the fly by a simple matrix multiplication in the transformed feature space. As LDA summarizes the distribution of positive and negative dataset in its mean and co-variance, it is much faster to implement. Mahalanobis distance metric is used for evaluation.

Questions-

1. How will this perform for inter-class variance? For objects among different classes but similar visual traits.

2. Has any recent computer vision papers made use of this method?
ReplyDelete
Replies
UnknownApril 8, 2016 at 11:32 AM
This paper presents a method to perform dense semantic correspondence which helps determine the points related between images. They also discuss using a discriminative detector at every pixel in the image using a linear classifier. The approach is to learn a linear classifier and then apply it in a sliding window manner to produce match likelihood estimate. They use LDA classifiers and show that they perform better than SVMs.

Discussion:
I don't quite understand how the LDA summarizes the negative set into its mean and covariance? And how why is it important to have classes modelled as Gaussians with equal co-variance?
ReplyDelete
Replies
UnknownApril 8, 2016 at 11:32 AM
This paper presents a method of building pixel-to-pixel correspondences between images from the same visual class. The authors train an LDA classifier which offers smaller computation time and the direct computation of posterior probabilities. The results from using sparsely selected keypoints shows this classifier's outperformance of SIFT flow, the previous state-of-the-art.

How does this algorithm perform on images with multiple objects?
ReplyDelete
Replies
prateekApril 8, 2016 at 11:47 AM
This work proposes a novel method for dense pixel wise correspondences between images using a linear classifier with an exemplar LDA. The method shows how using an LDA is beneficial over using an SVM in case of stationary distribution.

Questions: Most of the images are iconic, how will it work with clutter.

I could not understand the smoothness constraint over time. Could you please explain that.
ReplyDelete
Replies
UnknownApril 8, 2016 at 11:51 AM
The authors present a method of finding semantic correspondences across objects and scenes. An exemplar LDA classifier is learned for each pixel. This is then applied to the pixels in the target image. LDA has benefits over SVM like fast learning of a large amount of classifiers, and estimates of posterior probability.

Discussion:
1) What was the graphical framework used?
2) Not clear on why we have classes modeled as Gaussians with equal covariance?
ReplyDelete
Replies
Vasavi GajarlaApril 8, 2016 at 11:56 AM
Summary
The paper proposes a method of dense semantic correspondence to find key point matches between images which have similar high-level image structures even if their exact appearance and geometry differs. They learn LDA classifiers per pixel which they ultimately use as generative models to infer posterior probabilities of positive and negative classes. They conducted experiments of this method a sample of images from ImageNet using a mahalanobis distance metric.
Query
Could you please explain the displacement tensor function in equation 7?
ReplyDelete
Replies

Add comment