Advanced Computer Vision: Mon, April 18

Sunday, April 17, 2016

Mon, April 18 - Adverserial Networks

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Alec Radford, Luke Metz, Soumith Chintala. 2015.

project page, arXiv

21 comments:

Sam SeifertApril 18, 2016 at 12:18 AM
These guys train part of a CNN on unlabeled data (Image Net, LSUN a faces dataset generated for this paper). They use the output of their network as the input for a linear classifier to determine how well the unlabeled feature extraction performed. They show that the CNN trained with unlabeled bedroom scenes extracts features that activate around beds and windows (bedroom like objects). They show that they’re representations can be used to do some ugly looking feature arithmetic with their faces dataset (they just had to average 3 inputs instead of taking just one).

Discussion: “No data augmentation was applied to the images.” Why not? It seems like a good way to prevent the network from just learning the inputs.
ReplyDelete
Replies
UnknownApril 18, 2016 at 12:51 AM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownApril 18, 2016 at 12:56 AM
The authors utilize CNNs in the Generative Adversarial Network Framework in order to learn feature representations in an unsupervised fashion. The authors explore and analyze various models, providing guidelines and metrics for the best performing models. They then apply the latent representation learned by this framework in supervised tasks and show state of the art performance which is only lacking in comparison to purely supervised techniques.

Discussion:
One major critique is that the authors do not explain the loss function being used or how the discriminatory CNN works, instead referring us to the Goodfellow paper. Can it be elaborated how the discriminatory CNN knows how and what to discriminate? This would help answer questions regarding the unsupervised nature of the task.
ReplyDelete
Replies
UnknownApril 18, 2016 at 1:45 AM
This paper proposes a CNN based GANs (DCGANs) for clustering/unsupervised learning. The network is trained on 3 datasets - LSUN (3M bedroom images), Imagenet, Faces. The network is then evaluated by using it as a feature extractor for different classification tasks (like CIFAR-10 and SVHM digits). They also demonstrate vector arithmetic on Faces using the features from the DCGAN output. In general, they demonstrate that DCGANs can be one of the stable GANs and can learn features which can be used for classification/detection tasks.

Question:
1. When using Faces dataset, the authors mention that they use OpenCV Face detector to create the dataset. Doesn't the network (DCGAN) get biased towards the face detection algorithm while training?
2. I couldn't understand the significance of removing FC hidden layers or using LeakyReLU, when reading through Model Architecture section?
ReplyDelete
Replies
UnknownApril 18, 2016 at 7:23 AM
This paper evaluates different constraints on top of a Convolutional GAN. They find a set of constraints that seem to lead to stable training, and name this architecture DCGAN. The major architectural aspects that they find help stabilizes GANs are: replacing pooling layers with strided convolutions in the discriminator, and fractional-strided convolutions in the generator, using batchnorm, removing fully connected layers, using ReLU as the activation function for the generator layers, and using LeakyReLU for activation function in discriminator layers. The authors use the architecture to train a generator for both bedrooms and faces. They also train this architecture on Imagenet-1k and use it as a feature representation for classification of CIFAR-10. The results are comparable to other approaches.

Questions:

1) Why did the authors choose to test classification on CIFAR-10 over ImageNet?

2) In figure 1's caption the authors say "A series of four fractionally-strided convolutions (in some recent papers, these are wrongly called
deconvolutions) then convert this high level representation into a 64 × 64 pixel image."
Can you explain the difference between fractionally-strided convolutions and deconvolutions?
ReplyDelete
Replies
UnknownApril 18, 2016 at 7:47 AM
This comment has been removed by the author.
ReplyDelete
Replies
CJDSApril 18, 2016 at 8:56 AM
The authors designed a Generative adversarial net. It generates images in the shape of bedrooms among other things. Input to the network is a distribution across a space and the output is a combination of those bedrooms, represented by Z. They tested their model on the LSUN and CIFAR datasets

Question:
1. What is the difference between a fractionally sided convolution and a de-convolution. Why do they say some papers wrongly use the word deconvolution
2. The Example CNN seems to outperform their model in Table 1. Any insight as to why?
ReplyDelete
Replies
enlite traderApril 18, 2016 at 9:03 AM
summary:
This paper presents a technique to learn the representation of the input image data with Generative adversarial network. The key modification and adaptation of the DCGAN are all layers are convolution (not downgraded to FC layer), batch normalization for all layers except for the generator output layer and the discriminator input layer, leaky rectified activation in all discriminator. Deduplication process is used to decrease the possibility the network is memorizing the input examples:dropout regularized RELU autoencoder on downsampled training data. The input vector to the model shows support arithmetic operations with the semantic results showed in the papers

question:
The author mentions "leaky rectified activation (Maas et al., 2013) (Xu et al., 2015) to work well, especially for higher resolution modeling. This is in contrast to the original GAN paper, which used the maxout activation (Goodfellow et al., 2013)", they use LeakyReLU activation in the discriminator for all layers.

Can you give any intuition or explanation on leaky rectified activation works better for higher resolution modeling?
ReplyDelete
Replies
UnknownApril 18, 2016 at 9:59 AM
The authors in this paper make an attempt towards unsupervised deep learning by creating a deep convolutional generative adversarial networks. The motivation behind this approach is to learn unsupervised feature representations which is important for novel image generation (which happens to also be the application discussed in the paper). One major contribution of their DCGANs architecture is that the filters learned draw specific objects which again helps in the application of this network for novel image generation. The authors also show that their results can be used on 'feature arithmetic'.

1. Overall, I was really impressed by the results but found the face results to be a bit disappointing and inhuman. How might we improve these results? Would a better facial recognition algorithm allow for a better training set? Would providing better facial constraints such as the geometry of a human face give better results?

2. The authors also mention possible future work in audio or video - have any approached been done in these domains yet?
ReplyDelete
Replies
prateekApril 18, 2016 at 10:10 AM
In this paper , a novel adversarial generator network combined with convolution layers is shown. The authors propose changes in the network which make it easier and more stable to train the generative model.

The changes proposed are learning upsampling strided convolutions. Removing fully connected layers and adding batch normalization. They also discuss parameters which were important in training like momentum and learning rates.

They show results on variety of tasks such as bedroom generation, vector arthimetic and pose regression of faces.

Questions:

What is the discriminator network in this ? the part he talks about in classifying CIFAR

Could you also elaborate on the window removal method shown

ReplyDelete
Replies
UnknownApril 18, 2016 at 10:27 AM
This paper presents Deep Convolutional Generative Adversarial Nets, a deep learning approach that combines a teacher-student model with deep convolutional nets in order to gain a network to actively generate new images. Because this architecture can be extremely fickle to train, the paper primarily features advice on building these networks with an engineering focus. The authors conclude that the results are interesting, despite the remaining instability of their model.

Questions/clarifications:
1. How can one evaluate the effectiveness of the generative network? I feel as though a roadblock to this kind of work is in the difficulty of knowing of the generative network is actually succeeding.
2. What other generative approaches exist and how does this approach compare?
3. Does the advice on building DCGANs in this paper extend to other adaptations of adversarial networks? (Such as an adversarial autoencoder?)
ReplyDelete
Replies
anushaApril 18, 2016 at 10:49 AM
The paper introduces a CNN based general adversarial net. The paper visualizes the filters learnt by GANs and shows how specific filters learn to draw specific objects. It also lists the architectural guidelines that help stabilize the GANs such as eliminating fully connected layers on top of convolutional features, batch normalization in both the generator and the discriminator, replacing pooling layers with strided convolutions and fractional strided convolutions, using ReLU in the generator except for the output layer and using LeakyReLU activation in the discriminator for all layers. The network is trained on LSUN & Imagenet and evaluated for classification tasks.

Questions:
1. Why does leaky rectified activation work better maxout activation for higher resolution modelling?
ReplyDelete
Replies
UnknownApril 18, 2016 at 10:53 AM
This paper presents a method to use a Deep Convolutional Generative Adversarial Network to learn representations of an input image. It takes as input a 100 dimensional vector and generates an image that is represented by it. The main changes made to the DCGAN is that all pooling layers are replaced with stride convolutions in the discriminator and fractional stride convolutions in the generator.

Discussions:

1) The vector arithmetic shows poor-ish results, and may require an average of each class type. Doesn't this show that the network still fails to generalize within a class, and may still work based on memorizing data?

2) Difference between fractional strided conv and deconvolutions?
ReplyDelete
Replies
UnknownApril 18, 2016 at 11:03 AM
Abstract:

This paper presents a new architecture for generative adversarial networks. They have used CNN inside the GAN architecture of generator and descriptor. Earlier such attempts had limited success, so they have used some recent changes from CNN architecture, a) using strided convolution instead of spatial pooling b) Not using any FC layer c) Using batch normalization. Using these 3 changes they trained their network on LSUN, imagenet and faces datasets. They have evaluated their method for both generating images and classification using learned feature maps. Their network gives consistently better performance.

Discussion:

1) Could you please talk about the background of generative adversarial network as what are generator and descriptor are?
ReplyDelete
Replies
UnknownApril 18, 2016 at 11:21 AM
This paper presents a method that essentially tries to bridge the gap between the high success of CNN's for supervised tasks and unsupervised learning. Here the authors present an alternative approach for stable training of Generative Adversarial Networks (GAN) using CNN's called Deep Convolutional Generative Adversarial Networks based on architectural constraints. They report state of the art performance on various datasets by extracting the features learnt by the generator and feeding it to a supervised classifier.

Questions-

1. How are leaky ReLU's more effective?
2. Need more intuition about the discriminator part that they have not explained deeply in the paper.
ReplyDelete
Replies
UnknownApril 18, 2016 at 11:29 AM
This comment has been removed by the author.
ReplyDelete
Replies
UnknownApril 18, 2016 at 11:29 AM
This paper introduces (DCGAN), new adversarial networks CNNs for unsupervised learning. DCGNA was evaluated with set of constraints that made them more stable to train. The trained discriminators were tested as feature extractors for different image classification tasks and showed their ability to learn hierarchy of representations.

Can you elaborate more on the face sampling procedure, the authors claim that averaging the Z vector for three examplars showed consistent and stable generations, but the results in Figure 7 look really odd and make you question the how consistent or stable this generation procedure.
ReplyDelete
Replies
Aditi GuptaApril 18, 2016 at 11:33 AM
The paper presents a deep convolutional generative adversarial network(DCGAN) that is capable of unsupervised learning. The authors impose several constraints on their network to ensure that it is stable and scalable. They eliminate the fully connected layer in a conventional CNN and instead connect the highest convolutional features to the input of the generator and output of the discriminator. The authors also employ batch normalisation to stabilise the learning.
The authors run several experiments to demonstrate that their network is able to learn meaning information. They use it as a feature extractor on the CIFAR-10 dataset and achieve 82% accuracy on it and 22.4% error rate on the SVHN dataset.

Questions:
What is the intuition behind doing away with the fully connected layers of a traditional CNN?
Why is the nonlinear activation of the output layer modeled as a Tanh function.
Could you please explain the concept behind “Walking in the latent space”?
ReplyDelete
Replies
UnknownApril 18, 2016 at 11:37 AM
In this paper the authors introduce deep convolutional generative adversarial networks (DGANs) in an attempt to provide a CNN architecture which is robust enough for unsupervised training. They trained the DGANs on three datasets LSUN, Imagenet-1k and Faces using SGD for all the models. The representations learnt from training these models is then used to show that the features learnt provide a good representation of images for supervised learning and generative modelling. They also use these models on cassifying CIFAR-10 and SVHN Digits and compare the results with other established methods.

Discussion:
1. Why would they use a LeakyReLU as the activation function?
2. What is fractionally strided conv?
ReplyDelete
Replies
Vasavi GajarlaApril 18, 2016 at 11:46 AM
Summary
This paper introduces an adversarial network called deep convolutional generative
adversarial networks (DCGAN), whose architecture is shown to perform well on unsupervised as well as supervised learning. By training this network on datasets like LSUN, Imagenet, etc they show results of generative model on creation of various entities like bedroom, and also, those of unsupervised learning where the grouped images can be seen to transition smoothly in the sequence.
Questions
1) How are LeakyReLUs different from normal ReLUs?
2) Could you explain a bit about Adam Optimizer?
3) Could you talk about the level of features present in the Z space representation of faces? Can vector arithmetic learnt in Z space be used for separating style and content of an image?
ReplyDelete
Replies
UnknownApril 18, 2016 at 11:47 AM
The authors introduce a type of CNN called deep convolutional generative adversarial networks to better facilitate unsupervised learning by the networks. The authors imposed a number of constraints to ensure stability of network while training on three different dataset LSUN, Imagenet and faces.The discriminator network was used to extract features from image classification.

Discuss-
1) Could you explain fractionally strided conv.
ReplyDelete
Replies

Add comment