Advanced Computer Vision: Wed, Feb 3

Monday, February 1, 2016

Wed, Feb 3 - Sketch2Photo

Sketch2Photo: Internet Image Montage. ACM SIGGRAPH ASIA 2009, ACM Transactions on Graphics. Tao Chen, Ming-Ming Cheng, Ping Tan, Ariel Shamir, Shi-Min Hu.

project page

23 comments:

John TurnerFebruary 2, 2016 at 7:45 PM
This paper presents a mechanism for taking simple labelled user sketches and converting them into composite images. The labels are used to search on line and find potential candidates for all component objects of the desired result. These images are analysed for saliency, segmented via grab-cut, shape-matched to any contours provided by the user and then clustered by consistency. A novel hybrid-blending algorithm is introduced, intending to combat the perceived shortcomings of alpha and poisson blending. Sets of components are composed into optimal (as determined by texture and color similarity) images and presented to the user.

Question :
Can the system be improved by learning from past users' choices?
ReplyDelete
Replies
UnknownFebruary 2, 2016 at 8:32 PM
Summary:

This paper presents a system to convert user inputed rough sketches into photo-realistic image compositions. The system works as follows: First, the user inputs a simplistic sketch with associated text labels and horizon placement. Then, candidate images are retrieved form the Internet for both the background and for the best specified by the user. Results are ranked by how consistent the object images are with the background image, how similar the horizon placements are, as well as how “algorithm friendly” the object images are (this metric favors uncluttered salient images with nice borders around object). Object images are then blurred using a custom blending algorithm that is a blend of alpha and Poisson blending. Final results are then presented to the user who then can adjust the object placement as well as correct any segmentation artifacts. Results are impressively realistic!

Questions:
1. Could a model be trained to recognize human drawn shapes (given that the input is a nicely labeled image)?

2. How well does the contour filtering work in practice given that many of the shapes look nothing like their photo counterparts?
ReplyDelete
Replies
UnknownFebruary 2, 2016 at 9:29 PM
This paper describes a method for taking a user made sketch, and producing a realistic composite image. The pipeline is described in detail. First, suitable background images are selected. Images are found from Internet queries using user labels as search terms. The search results for background images are rated by considering how cluttered the scene is (more uniform scenes are preferred) and content consistency (how similar the image is to other images in the query). After selecting a background image, scene item images are searched for. Results are sorted by a score that depends on how simple the image background is, how well the segmented object fits the provided contour, and how consistent the image is with the other results. The authors use a new blending method to combine the set of selected images. After the composite image is complete, the user can choose areas to adjust.

Questions:

1) How would the process handle more realistic drawings of scene items?

2) What happens if a scene item is drawn inside another scene item?
ReplyDelete
Replies
UnknownFebruary 2, 2016 at 9:32 PM
This paper present a method for conversion of basic human drawn sketches to images. The image is composed by taking segmented sub images corresponding to the different parts drawn in sketches and then blending in the segmented images to generate the output image. The database images are obtained from internet and then they are filtered at each stage depending on content consistency and alignment. the saliencey features and grab-cut algorithm is used for segmentation. The blending of images is done by combining Poisson blending and alpha blending. Interactive refinement further improves the quality of generated image. Finally the authors present the results of their system in terms of average rated score and time taken for generation.

Question-
How does segmentation work in case where the required portion may be non-convex?
ReplyDelete
Replies
Sam SeifertFebruary 2, 2016 at 9:50 PM
This paper outlines a process to generate an image from a sketch using google images or another search engine as an image database. Users specify a background by key word "meadow, mountain, beach." The author's then cluster resulting images based on color features. The biggest cluster is chosen as the consensus (with best probability of matching the users search intent). Images are then selected from this group and presented to the user in ranked order. The authors look for iconic images by ignoring those with lots of clutter (saliency based test). Authors rank objects in the sketch in a similar manner, however they also consider the contour that a user provided as a rank able feature. The authors then offer an improved blend after cut strategy.

Discussion: Could they have added multiple objects from the same image? You’d be less likely to find an image with two-query object with matching contours, but who knows! They could save some of the work they had to do blending wise. Same with object background. I’m sure they could find lots of meadows with a cow in them, and then the only thing they have to do is add a sheep. I would love to see an image search algorithm with their interface, where you can sketch objects and layouts and it spits back the closest real neighbor you can find. I understand that too specific queries might return results with low scores. Do the user drawn contours classify just on the outer edges or on the inner edges as well?
ReplyDelete
Replies
UnknownFebruary 2, 2016 at 10:25 PM
This paper details a pipeline for generating photo realistic images from hand drawn sketches which are text labelled. The labels are used to perform image searches for the background and each scene item to get a set of candidate images which are then filtered based on saliency segmentation (which aids more towards blending than actual filtering), contour consistency of the segmentation and content consistency (which uses clustering to find similar images based on the appearance features). The major novelty of the paper is hybrid blending approach which works on super pixels classified as either M1 or M2 based on texture and color consistency. The categories are then used to calculate the optimal set of super pixels which will form the blending boundary and the improved Poisson blending is performed on pixels labelled M1 and alpha matte blending on pixels labelled M2. The blending cost is also used to rank the images in terms of composition quality before they are presented to the user.

Discussion:
1. The authors impose many constraints when doing background image search. Wouldn't they benefit from the method in Lalonde et al. in the case of cluttered backgrounds? They could have possibly tried adapting the method in Lalonde et al. to their pipeline and report results.

2. Their algorithm makes the assumption that scene items do not overlap "often". How would they handle overlapping of, suppose, multiple people in a crowded scene?

3. Why are the authors moving away from a completely automated system to one that requires user sketching and labelling to fine tune refinement? Doesn't that defeat the purpose of the motivations provided in a way?
ReplyDelete
Replies
UnknownFebruary 2, 2016 at 10:37 PM
This project is developed for photo-realistic image synthesis based on a simple sketch with a data-driven approach. The user would sketch a scene with a label for background and also with some outlines and labels for objects to be in the image. Relevant image/image-segments are picked from vast pool of internet images, filtered to fit with the sketch and then stitched/ blended to form a photo-realistic image.

The background images obtained from internet are filtered out based on horizon position and angle, and amount of clutter (foreground objects) in the image. Foreground objects' images are filtered based on saliency and shape matching with sketch. The foreground objects are then segmented based on grab-cut algorithm.

The segmented objects are blended in with background and a cost is calculated for blending of each segment of image. The synthetic images with lowest costs are then chosen and provided to the user to select whichever looks more photo-realistic.

Discussion:
1. Unlike 'Photo Clip Art' project, matching illumination of background and foreground doesn't look like to be taken care of. For example, the shadows of foreground objects are sometimes incorrect.
2. Does pose of the object (size and angle) completely rely on user sketch? Example, In 'cheetah chasing bike' images, we can see that the pose of cheetah doesn't fit with background.
ReplyDelete
Replies
UnknownFebruary 2, 2016 at 10:59 PM
The paper presents a algorithm for generating realistic photo images using simple labelled sketch images. As input to the algorithm, a sketch with labelled entities is provided. The creation of photo image is done by using images from flicker, google etc. They are filtered by relevance of image content. The appropriate image segments are selected from the dataset using suitable segmentation algorithm. Then image blending is done using hybrid combination of poisson and alpha blending. The images with lowest blending costs are then provided as output and refinement is done interactively.

Discussion -
How is exact location of putting the entities of image determined relative to other entities.
ReplyDelete
Replies
JonathanFebruary 2, 2016 at 11:09 PM
Paper describes how to go from a user-sketch to a composite image.
First, images are found via the Internet using the user's labels.
The search result for background and each scene item which are filtered (e.g. content consistency, contour consistency, and more) and
then blended. The user also plays a role in interactively improving the system, like refining
any the segmentation.

Q: How do you know when the automatic analysis is unreliable?
ReplyDelete
Replies
UnknownFebruary 2, 2016 at 11:21 PM
This paper describes how a user interactive application can be developed for generating photo-realistic synthesized images. A user can effectively draw an object on the canvas with a suitable label attached to it and the system would generate 10 ranked composited images as output. The paper describes their method of reducing irrelevant results retrieved from the search query by using Content Consistency Filtering and Uncluttered Region Filtering for the background uncluttered landscapes. While object images require a closer attention and they explain further methods that they employ. One the major issues that they highlight is of effective blending. Blending is prone to lighting conditions and texture or color differences. Alpha matting is prone to lighting conditions while Poisson Blending performs at a lower scale for images with varying textures. They describe a novel combination of both techniques employed. Lastly they describe the shortcomings of this technique for images with more than 2-3 objects, pose issues and light occlusions.

Questions -

1. How could they have leveraged the fact of pose estimation technique employed in the "Photo-clip art" paper?

Discussion-

1. In today's context, can another approach be used for searching objects keeping MS COCO in mind providing high quality segmented images? Though it would be a limited dataset, I think it would be a faster approach not only keeping today's processing power in mind but also eliminating the downloading part.
ReplyDelete
Replies
enlite traderFebruary 3, 2016 at 12:27 AM
Summary:
This paper presents a system to generate image from simple sketch with text labels by utilizing huge amount of images online. To get Image Candidates from text labels, backgrounds was clustered by mean shift under histograms in LUV space and horizon line was estimated to find the best match with the sketch;Scene item was first selected based on saliency detection with carefully discarding segments more than 10 around the 30 pixel width of the item, grab-cut algorithm was then used by dilating the segmentation of the item and finally morphological operator was used to selected the best contour consistent item. To blend all the found items with best fit, blending boundary was first expanded and then optimized restricted to texture , color and illumination fitting with Improved Poisson blending on and near scene item and alpha blending mostly on background.

Question:
From the examples of the paper, illumination match seems like a problem even for some of the good results (figure 8,10), what are some of the state-of-the-art techniques to address this kind of problem?
ReplyDelete
Replies
Vasavi GajarlaFebruary 3, 2016 at 12:51 AM
Summary:
This paper presents a system for generating realistic images resembling a labelled sketch using the vast amount of images on internet for choosing the backgrounds and scene items. It generates multiple images by composing images with labelled background and scene item segments taken from various images on internet and blending them seamlessly. The novelty of the paper lies in the candidate-image filtering algorithm and the hybrid optimization technique it uses. This paper reiterates the fact that more semantic information in input sketches results in better content-aware image synthesis.
Questions:
1) The paper mentions taking histograms of images in LUV color space as image features during content filtering phase of the image background. What exactly are the histograms made of? And how are these histograms good representations of content in images?
ReplyDelete
Replies
anushaFebruary 3, 2016 at 1:07 AM
The paper introduces a system that uses a freehand sketch annotated with text labels to compose a picture. The key steps involved in the process include background image selection (based on the user's input/sketch),scene item selection and hybrid image blending. While choosing the background image, the system ensures the background is a clean landscape background. Scene items are filtered in such a way that they can be easily segmented. Contour consistency filtering and content filtering help refine the set of selected images further. The paper describes a novel hybrid blending approach, a combination of both alpha matting and poisson blending.

Questions:
1. The paper mentions that background filtering works for landscape images and fails for indoor scenes mostly because indoor scenes are more cluttered. How could we extend the algorithm to make it work for cluttered environments as well?
2. The overall processing time for generating the results shown in the paper is quoted to be 15n+5minutes where n is the number of scenes which is quite slow. Processing the background itself takes 3-4 minutes. How could the proposed algorithm be further optimized to reduce the overall processing time?
ReplyDelete
Replies
Aditi GuptaFebruary 3, 2016 at 1:31 AM
The paper presents a novel method to convert user drawn sketches into realistic images. The paper focuses on choosing the most “algorithm friendly” images to achieve this. The algorithm takes as input a rough sketch by the user indicating the objects in the scene, labels corresponding to the objects and the desired background for the scene. It then selects relevant background images by performing content consistency filtering and uncluttered region filtering. Similarly relevant scene item images are retrieved by performing saliency filtering followed by scene item segmentation, contour consistency filtering and content consistency filtering. The results are then integrated using a hybrid blending technique to create the final image.
1. How can we make the interactions between the different items in the image more realistic? For example in the man and dog image, a lot of results depict the man pointing in a different direction than the dog’s motion. Similarly for the bear and salmon example as well , the position of the salmon with respect to the bear is awkward.
ReplyDelete
Replies
cuFebruary 3, 2016 at 9:13 AM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownFebruary 3, 2016 at 9:14 AM
This paper presents a system for users to generate an image from an annotated sketch. The user can also toggle the horizon line, which is at mid-height by default. Appropriate and suitable images are found using the sketch and text annotations, and then blended together. The result is presented to the user, who can then choose to make any adjustments as necessary.

Could the technique from Photo Clip Art of choosing images with similar properties be used in this system to improve poor results from differences in illumination?

What techniques can be used to address the issue of occlusions from objects being placed over other objects from the background image?
ReplyDelete
Replies
prateekFebruary 3, 2016 at 10:33 AM
The paper presents a way to generate completely new images from annotated sketch images. Method is completely data driven where the annotated tags are used for querying and filtering images based on the sketch and the image background. The paper shows impressive results in creating photo realistic scenes.

Discussion:

Why is the background generally considered to be a landscape. Would other cluttered backgrounds lead to failure of blending.

Data driven approach have the problem of only having labels which are present in the set. Can we extend this to a nearby labels in a more generative way.
ReplyDelete
Replies
CJDSFebruary 3, 2016 at 10:37 AM
This paper presented a system for users to transform their sketches into composite images. The paper focuses on how to choose the "correct" images and on blending the two images. The user specifies the horizon line, sketch and annotations. The researchers then choose an appropriate image from a Google Search.

Questions:
1. Could the researchers use a database like MS COCO instead
2. Could the technique be further improved by adding more text or a better drawing? Which one would make a bigger difference
ReplyDelete
Replies
UnknownFebruary 3, 2016 at 10:42 AM
The paper describes a method of creating photo-realistic pictures from a hand drawn sketch, labeled with text. It presents filtering schemes to select viable background images and scene image candidates. A novel hybrid blending technique is also used, that provides improved image composition results.

Discussion:
Indoor backgrounds have been stated to be a problem here. Can’t we exploit saliency maps of indoor images, to figure what could be a good candidate for a background image?
ReplyDelete
Replies
sfenu3February 3, 2016 at 11:03 AM
Synthetic image composition today usually requires experience in photoshop or some image editing tool. This paper presents an easy and intuitive way to compose synthetic images by having a user sketch contours of the desired objects (with some annotations as to what each contour is), then matches the contours with image candidates and blends them together in an attempt at generating a realistic synthetic image.

Questions:

1) In future work, could providing more labels or annotations about object attributes yield more realistic images?
ReplyDelete
Replies
UnknownFebruary 3, 2016 at 11:28 AM
Sketch2Photo is a system that attempts to combine sketching and photomontage for realistic image synthesis. The sketches are hand drawn and tageed with a text label by the user. These labels are used to fetch relevant background images by searching an image database. The fetched images are then ranked them by the uniformity of the background using a stanard segmentation algorithm and using saliency, contour consistency filtering. Then they introduce a hybrid image blending algorithm which uses a combination of poisson and alpha blending. The system produces optimal images by blending together various components and presents them to the user.

Questions:
1 Of the candidate images pulled from an image database, how are they arranged and composited to form a scene background?
2 Any susbsequent work done to fix the problem of indoor scenes. The authors mention clutter as a reason for failure.
ReplyDelete
Replies
UnknownFebruary 3, 2016 at 11:50 AM

This paper presents a system that takes an input a sketched scene draw and labeled by a person. Try to find best set of images for each object then generate different photo-realistic scenes that best match the input sketch. The paper describes the main pipeline for this system: starting by querying the Internet, apply set of filters to select the best background image and scene items. Finally, the paper presents a novel approach for image blending technique that optimizes the bending boundary for each pixel, then use a combination of Poisson and alpha blending techniques to computer the final blending result.

1- How can this work extended to accept for descriptive input labels, adding more adjectives or actions or interactions between the key entities in the scene.
ReplyDelete
Replies
UnknownFebruary 3, 2016 at 12:17 PM
This comment has been removed by the author.
ReplyDelete
Replies

Add comment