Advanced Computer Vision: Wed, Mar 16 - Learning Visual Similarity for Product Design with Convolutional Neural Networks

Monday, March 14, 2016

Wed, Mar 16 - Learning Visual Similarity for Product Design with Convolutional Neural Networks

Learning Visual Similarity for Product Design with Convolutional Neural Networks. Sean Bell, Kavita Bala. Siggraph 2015.

author page, pdf

21 comments:

UnknownMarch 15, 2016 at 1:27 AM
This paper presents the use of a Siamese CNN architecture to do multitask learning in order to figure out a good embedding of both in-situ and iconic product images. The tasks here are calculating an embedding of the images to minimize a contrastive loss as per a distance metric, and to predict the category of a product image. The authors explore the utility of their approach in visual similarity search for product images given in-situ images, in-situ images given product images and stylistically similar images. The training data is collected via crowdsourcing on the AMT framework. Different architectures and distance metrics are evaluated and their results presented.

Discussion:
In some of the previous papers, we have seen that CNNs are capable of capturing color and texture information as well at a low level. Could this be potentially used to allow for mapping between objects and stuff? For example, if I have my house painted in a particular color, I would like furniture that actually goes well with that color palette, but walls are not exactly quantified objects.
ReplyDelete
Replies
UnknownMarch 15, 2016 at 5:57 PM
Abstract:

The paper presents a deep learning architecture for finding visual similarity of product design across their iconic images and in-situ images. Their main contributions are to first collect a large dataset in this domain and present performance of various architectures on it. The dataset was collected from Houzz website and tagged using mechanical turk. The best performing architecture is Siamese network trained with both contrastive and categorical loss. Both the quantitative and qualitative results with this architecture looks great and provides new opportunities in the field.

Discussion:

1) How can this be extended to have context queries with multiple objects, like where you can give multiple furniture items that you currently have and query for the missing chair that can go with it?
ReplyDelete
Replies
enlite traderMarch 15, 2016 at 6:03 PM
summary:
This paper presents an methods of doing queries of images based on the trained Siamese network.
Crowd- sourcing pipeline is developed to collect pairings between in-situ images. Loss function is consist of penalty for similar images that lies far away in embedding, penalty for dissimilar images that are ,however,nearby. The visualized embedding shows the trained CNN embedding are more based on the 'visual' similar items except not including some semantics of the object.

question:
At discussion part, this paper mentions style similarity vs image similarity. image similarity -> style similarity but not the other way. How is their recent result on this? I understand 'style' is hard to define. I could think of maybe using the skipping methods that taking the low level visual cues to take the vote combined with the deep network's vote.(Fully Convolutional Networks for Semantic Segmentation)
ReplyDelete
Replies
Sam SeifertMarch 16, 2016 at 12:25 AM
This paper uses labeled data from Houzz (a site similar to Pinterest where users ask and help others to identify products in images). They ultimately try to match iconic (white background, photoshopped) images of an object with the tagged objects in real images. Their positive training data is all the matches in the dataset, and their negative training data is random matches between two objects (one iconic and the other not) from different labels. Their scoring metric is recall, and when the algorithm returns a single item, their average class recall is less than 10%. They outperform the baseline they set, and they have the capability to return more than 1 item label to a potential user. When the algorithm returns the top 20 item labels, their average class precision is around 25%.

Discussion: They sort of say this and never back it up, why does padding help: Padding. At test time, we experimented with adding different amounts of padding to the input query. We find that a modest amount of padding, 16 pixels, is optimal. Since all algorithms benefit from padding, we only show the effect on the best algorithm.
ReplyDelete
Replies
UnknownMarch 16, 2016 at 3:38 AM
This paper presents a method to find objects of a certain type from a database, that are most similar to a certain queried object. It leverages crowdsourced annotated data from Houzz.com. Positive training data are pairs of in-situ and iconic objects, and negative examples are random pairs of the same and different categories. A contrastive loss function is used which pulls positive pairs towards each other, and pushes negative examples apart to form an embedding.

Discussion:
They talk about training for style similarity in the future. Wouldn't this mess with the embedding?

I'm sure this has been used before, but I am curious to find out how this kind of CNN would perform on pure recognition tasks. Post detection, find the embedding of the bounding box, followed by clustering the D-dimensional space to classify the object.
ReplyDelete
Replies
UnknownMarch 16, 2016 at 8:26 AM
This paper presents an approach for matching in-situ images with iconic product images. First a dataset was created by crowdsourcing with Mechanical Turk. Data that already had some product tags was first pulled from Houzz. Next, Annotators were asked to draw bounding boxes around in-situ items, and fix invalid tags. Ultimately, the authors found that using a siamise GoogLeNet architecture had the best results.

Question:

Has this approach been used in industry? The product search seems like it would be useful for large shopping sites.
ReplyDelete
Replies
John TurnerMarch 16, 2016 at 8:52 AM
This paper presents a mechanism for calculating a distance metric for objects in environmental settings (in-situ) in images with iconic counterparts. This kind of use in situations where users would be curious to find similar examples to a particular object, say a chair, in some kind of real-world setting, or else find an iconic example of a particular object within an image in order to investigate it further, or perhaps buy it. This is difficult because the objects "in the wild" will have very different orientations and scales, environments within which they will be found, different backgrounds, unpredictable occlusions, and different lighting conditions, all of which need to be accounted for.

The authors address this challenge, the authors first built training data by crowdsourcing a large imageset to generate a set of iconic and in-situ image pairs, combine the data using a siamese CNN to learn an embedding of each image of the pair, and then demonstrate the efficacy of this embedding in one-sided image search, where one image, either iconic or in-situ, is provided and a candidate for the other image is proposed.

Questions/Discussion :
1) this paper's approach seems like it might be successful in learning and classifying MS-COCO. has this approach been attempted?
2) could this be used in as a generative tool, possibly to fill in occluded elements of an object from an in-situ image?
ReplyDelete
Replies
JonathanMarch 16, 2016 at 9:01 AM
This paper uses a siamese network to find a distance metric for in-situ vs iconic products. The authors first built training data using MT to generate pairs of in-situ and iconic images of products.

Q. Why would padding help? (from sect 6.2)
ReplyDelete
Replies
Vasavi GajarlaMarch 16, 2016 at 9:45 AM
Summary
This paper proposes a visual search algorithm that can match in-situ images with corresponding iconic product images. They collected data using mechanical turk, trained a Siamese CNN to learn embedding vectors for the training data, and finally used this network to run queries like finding a product, finding designer scenes which a product might be found in, and visually similar products across categories.
Questions
What other kinds of learning could be beneficial by training networks on different domain task (transfer learning)?
ReplyDelete
Replies
sfenu3March 16, 2016 at 10:07 AM
The paper presents a siamese network to calculate a distance function between objects in iconic representations and the same objects in their natural settings. The resulting distance function is then used to embed the images into 2 dimensions for easy visual search.

Questions:

Is it common practice to try to learn embeddings like this for generative purposes? (i.e. to generate some image that is 'between' two other known ones?)
ReplyDelete
Replies
anushaMarch 16, 2016 at 10:12 AM
The paper introduces a visual search algorithm to match in-situ images with iconic product images. A crowd-sourced pipeline is used to collect pairing between in-situ images and their corresponding product images. This data is combined with a siamese CNN and used for image search applications. The paper evaluates different training methodologies such as training with contrastive loss, object classification softmax loss, and the effect of normalizing the embedding vector.
Question: What are the other applications in which a siamese CNN architecture is used?
ReplyDelete
Replies
prateekMarch 16, 2016 at 10:13 AM
This paper proposes a way to find iconic imaged of objects from a catalog to its usage in home setting. They show how such an embedding can be learnt using CNN. The main contriibutions of this paper are:
Introducing a large dataset with human annotated bounding boxes to compare non iconic and iconic images for learning an embedding.
They train a CNN with a siamese architecture with a specific loss function which maximizes the similarity
They show extensive results on using these embeddings to find the nearest neighbors in categories.

Questions:
I dont understand how would the testing take place for a case like the first image where there are 2 images of different objects. Is this just a misleading image.

Can this be more towards incremental learning where the classes keep on adding and the network can increase its embedding space.
ReplyDelete
Replies
CJDSMarch 16, 2016 at 10:38 AM
The paper presents a siamese network to get similar images from a data set of objects. They use a CNN to map the similarity of objects. The resulting distance

Questions:
Could this be used generatively ? Like for scene completion
ReplyDelete
Replies
UnknownMarch 16, 2016 at 10:52 AM
This paper describes a method for learning the embedding in visual search and finding similarity of objects across in-situ and iconic images. The authors build a deep Siamese network to train on images obtained from houzz and bounding boxes drawn by Mechanical Turkers.

As mentioned in conclusion how do authors plan to train to understand visual style in products?
ReplyDelete
Replies
UnknownMarch 16, 2016 at 11:07 AM
This paper presents a method of using a siamese architecture and multitask learning to find visually similar objects in images. The authors built a database from Houzz and crowdsource to get pairs of in-situ and iconic images. They use a siamese network to learn a single embedding for the pair, and test by performing searches that return either product or in-situ images.

The authors mention not using the predicted object categories. Has anyone tried using the this to see if it improves the results?
ReplyDelete
Replies
UnknownMarch 16, 2016 at 11:43 AM
This paper resents the ueof twin CNNs to find visual similarity in visual deigns of products from a iconic image dataset of those proucts. The main task is to match the iconic images of those objects with the tagged images or the in-situ images. Data was obtained from Houzz.com. The training data was built using AMT with the workers annotating bounding boxes around the iconic representations. The then train the CNNs using their siamese architecture and use a loss function that tries to minimize matching errors.

Discussion:
How would this pipeline perform on pure object detection tasks and scene matching problems.
ReplyDelete
Replies
UnknownMarch 16, 2016 at 11:52 AM
This paper proposes a Siamese-CNN model that finds visually similar objects in different scenes or domains. First, the paper introduce a new pipeline to collect training data using AMT. Using this dataset, 2 CNN networks are trained on images of objects in scenes or situations, and the other is trained on iconic images of the same objects, the Siamese network will learn an embedding model that places similar objects close to each. The paper demonstrates the use of this work in different visual search tasks.

Figure 1 it shows there are 2 inputs to the network, the first one is the input scene and a box, but it seems that the network is trained only in the object box not learning much about the scene,,?
ReplyDelete
Replies
Aditi GuptaMarch 16, 2016 at 11:52 AM
The paper presents a method to learn a high quality embedding of product images in two different domains: products cropped from natural images and products in their iconic form. The authors develop a crowdsourced method to collect in-situ images and their corresponding iconic product images. They then train a siamese CNN architecture using different training techniques, including training with contrastive loss and object classification soft max loss. Finally the authors employ the algorithm in image search applications and demonstrate the results.

Question:
Towards the end of the paper the authors discuss the difference between style and visual similarity. Could you please elaborate on that.
ReplyDelete
Replies
SEO ServicesJuly 2, 2021 at 8:21 AM
I have never seen such blogs ever before that have complete things with all details which I want ! Keep posting.

barcelona Chair
ReplyDelete
Replies
AnonymousApril 30, 2022 at 1:23 AM
mmorpg oyunlar
İnstagram takipçi satın al
tiktok jeton hilesi
tiktok jeton hilesi
antalya saç ekimi
referans kimliği nedir
İnstagram Takipçi Satın Al
metin2 pvp serverlar
instagram takipçi satın al
ReplyDelete
Replies

Add comment