This class will be somewhat unusual in that we won't discuss a particular paper. We'll try to make sure that we understand deep learning well enough to follow the papers for the rest of the semester.
CVPR 2014 Tutorial on Deep Learning. Graham Taylor, Marc'Aurelio Ranzato, and Honglak Lee.
Read only the first two sets of labeled Introduction and
Supervised learning.
CVPR 2014 tutorial
One important motivation behind the growing adoption of Deep Learning techniques in the realm of computer vision is the replacement of hand-crafted features with learned ones, at multiple representational levels. Such features not only improve algorithm performance but also can be applied to problems where hand-crafted features are unavailable or arcane. These papers provide an introduction to these motivations behind deep learning and present example applications in different realms of ML. Also, in the case of the supervised learning paper, a more in-depth examination of the back-propagation optimization algorithm along with the architecture of convolutional nets and how they work. Of particular interest, at least to me, was the last half of the paper, which gave numerous general troubleshooting tips and intuitions, and links to code libraries, and references existing nets and applications.
ReplyDeleteQuestions/discussion :
Would it be feasible to warm start the training of a CNN by initializing the weights to look like existing hand crafted features (i.e. oriented DOG filters)?
Kind of reminiscent of the Minsky fable on randomly wired nets. It seem to me that on the one hand you'll have more information as to what the network's biases are, but on the other it's unclear what guarantees you can make that the warm start you choose is always a better one than random weights.
DeleteThe deep learning tutorial talk focuses towards the representation learning and auto learning of features instead of hand tuning in various computer vision tasks. The slides also give brief about back propagation algorithm that can be used to train any neural network given the used non linearity layer is differentiable. The tutorial then goes on to explain the importance of structure in convolution neural network and how they use spatial locality and reduce parameters by weight sharing. The talk ends with various tricks and suggestions like dropout, visualizing hidden layers etc. to train the DNNs efficiently.
ReplyDeleteQuestion:
1) On slide 80 of chapter 2, after the first convolution shouldn't the size be 16 @ 92*92. Also after second convolution layer how is the size 24@18*18? Shouldn't it be 96@18*18?
Discussion:
1) On slide 62, L2 pooling and L2 pooling over features is mentioned. Does the latter means pooling over input feature maps and if so when do we use it?
Deep learning replaces hand-crafted features, e.g., SIFT/HOG. Convolutional Neural Networks (CNN) have had great success in many applications. The tutorial goes on explain the in-and-outs of training CNNS, like normalization, maxout, etc., as well as backprop. The tutorial also talks about how to fix things if they go awry.
ReplyDeleteQuestion:
1. What are "degenerate" solutions? And what is the "pull-up" term?
This tutorial gives a nice intuitive overview of how deep learning works. The supervised learning tutorial also discusses basic architecture and the results of using each layer within the architecture. The main advantages for using a deep learning approach is that it avoids some of the problems with the traditional sliding window approach and it doesn't rely on hand crafted features.
ReplyDeleteDiscussion Questions:
1) What does the future deep network architecture look like? Even deeper networks, even wider networks, clever layer choices?
2) What are the current limitations towards using a deep learning approach? When would older approaches be more appropriate?
Summary:
ReplyDeleteDeep learning is a layered architecture which learns important features from a given dataset without the need of manual feature engineering. It is especially useful in applications where features are difficult to hand engineer like for videos, kinect data, etc. The layered network learns more and more high level features as depth increases by coming up with weights and biases for features at each layer. Deep learning techniques work well when there is large amount of data available; applications of these techniques include face recognition, reading traffic signs and recognizing hand written digits.
Questions:
1) Slide 24, Set 1 – How exactly can the various dimensions of layer outputs be understood from the slide?
2) Slide 31, Set 2 - How is the output becoming a regression of cosine significant after the training stage is done?
The deep learning presentation gives an overview of the basics of learned representations and points out the differences between learned and hand-crafted representations. Convolution Neural Nets are described in detail; earlier layers capture low level features such as edges, later layers capture higher level features. Normalization, pooling and the use of non-linear functions are explained. Applications of deep learning are shown. Included are many iconic computer vision problems: object detection, scene parsing and facial recognition. The paper ends with a set of suggestions for verifying that a deep net has been trained correctly.
ReplyDeleteQuestion:
1) How do we decide on an activation function? ReLU seems to be the standard, but do different functions lead to better performance for different tasks?
This comment has been removed by the author.
ReplyDeleteSummary:
ReplyDeleteBefore deep learning, computer vision features are handcraft:data feeds in the vision system are without hierarchy in the features level. Deep learning introduced new architecture to not only learned the classifier based on the feature extracted but also the features themselves from different level. This leads to more accurate feature representation of the input data therefore leads to better classification in supervised learning with softmax. Unsupervised learning can also benefit from this deep architecture of the neural networks because "better" features are learned through data therefore resulted in clustering data with better representation.
Questions:
1. How to design the architecture of the Deep Neural networks such that it's good enough for different computer vision tasks other than things already mentioned in the slides?
2. What are the lesson learned in designing the architecture of CNN?
3. In CNN and auto-encoder, why softmax was chosen over svm or other classifier?
This is a tutorial that explains the fundamentals behind what is deep learning. How it's application has helped along difficult computer vision problems, object recognition is just one problem. The tutorial goes in depth to touch each and every aspect of the supervised and unsupervised deep learning architectures that are commonly used with a root basis in neural networks. Due to availability of large scale data for a particular task, supervised deep learning architectures have achieved great improvements upon traditional approaches. Convolutional Neural Networks or CNNs, one of the more popular approaches which effectively explores the relationship of a pixel with its neighbors has been explained in great detail. Many questions regarding the complexity of each layer, better optimization approaches has been answered.
ReplyDeleteQuestions-
1. Deep learning architectures are very task specific. Has there been an attempt to make them more adaptive? By adaptive I mean rather than training it explicitly on different classes, if you show a different class of images to an architecture trained on a specific class repeatedly (large number of times), can the architecture adapt to learn to classify that class as well?
Questions:
ReplyDelete1. In the 70s and 80s we saw great breakthroughs in signal processing when DSP chips became commonplace. Now that linear algebra can be done more efficiently on specialized hardware (GPUs), we've figured out how to train classifiers with it and it has lead us down a long path to deep learning. What kinds of breakthroughs occur, if say, analog chips for computing LBFGS, or for linear constraint satisfaction become common? What are the current computational bottlenecks that could more easily be solved with specialized hardware?
The introduction gives background information on the concept of "deep learning," or using learned feature representations instead of hand-crafted ones. Then the deep learning presentation provides an overview of the architecture of deep networks and the different layers as well as tips on designing the architecture and possible explanations for unsuccessful networks. Examples of applying the deep learning techniques to common computer vision tasks are also shown.
ReplyDeleteWhen would traditional approaches be/perform better than these deep networks?
The tutorial walks you through how deep learning works. Deep learning performs better in complex classification or pattern recognition applications mainly because deep nets break complex patterns into a series of simpler patterns. For instance if a net had to decide whether or not an image contains a face. A deep net would first use edges to detect different parts of the face and would then combine the results together to form the face. This feature of using simpler patterns as building blocks to detect complex patterns is what deep nets their strength. The tutorial discusses the architecture of ConvNets, how they are trained, optimization methods and how to go about improving generalization and avoid overfitting.
ReplyDeleteQuestion:
In the slide that explains the taxonomy of feature learning methods, supervised deep learning can be done using different nets like recurrent neural nets, conv neural net and deep neural net . How would you choose which kind to use for a particular application?
~
Anusha
The introduction to Deep Learning section was a good refresher of all the topics discussed in CS 6476. The supervised deep learning tutorial provides a layer by layer examination of how a convolutional architecture works and why is it well suited for images and computer vision.
ReplyDeleteQuestions:
1. A lot of the parameter tuning in Deep Learning has been termed "black magic" and needs some insightful understanding of how to design architectures. What are some of the key intuitions once can take away for designing architectures, especially given that Deep Learning generalizes well?
2. I have tried training a deep network from scratch in the past with little success. What would be your advice on various pitfalls when either training from scratch or augmenting an existing network?
The tutorials give a brief description of deep learning method. Traditional CV methods and their drawback in performance are overcome by deep learning methods which auto find the better features. A brief overview of different activation layers is given and then explanation of pooling methods is given. As shown in case of eye detection max pooling gives robustness against spatial locations. The application of deep learning have been diverse ranging from pedestrian detection to scene understanding. They have also been used for face detection and have given better performance than humans in case of signal detection. Different basic optimization tricks have been described. Uncorrelated feature map signify good learning of the parameters.
ReplyDeleteCan we have an easy visualization method to observe what each layer of the network is focusing on particular image?
Hey Shantanu, you can check out the DIGITS framework by NVIDIA. It provides you with a visualization of each and every layer and what that layer is looking for in the image. I hope that helps.
DeleteThis was introduction material covering Deep Learning. We were told a similar tale in class, how Computer Vision as a field relied on hand crafted features and now state of the art is learned features. These slides were a good introduction for me to Deep Learning, as we did not cover it when I took Computer Vision (with Bobick). However I do look forward to talking about it in class tomorrow to clarify some of my more trivial questions.
ReplyDeleteAs for discussion topics:
One of the nice things about Sift features was that they were rotationally, intensity, and positionally invariant. Meaning that (in theory) a feature in the image would receive a similar sift feature vector even if it was darker/brighter, rotated, and in a different section of the image.
In these slides, I saw how these learned features could be made more intensity invariant by normalizing contrast on a local scale. I also saw how the sliding window technique gives a positional invariance. I didn't see anything that would leave me to believe that learned features can handle rotation well. Can they? If so, why / how?
These Deep Learning tutorials start off by revisiting the 'old' ways of coming up with elaborate hand-crafted features and contrast them with the DL based techniques that focus not just on providing a simple mapping from representation of data to output features but also learning the representations themselves. The tutorials covered many topics that were discussed towards the end of the Computer Vision class last sem and also went into details about the key ideas and some math behind each layer.
ReplyDeleteDiscussion:
One thing that I don't quite understand is how the cost and loss functions are chosen. There are cases where cross-entropy (Softmax Regression) is used and others where its mean squared error.
A key requirements of supervised learning is lots of labelled data which may be a cause for concern while handling say video data. Also, how do supervised DL techniques perform with super large datasets (eg: video feed) with so much time spent on feature computation?
A comment above mentioned something about easy visualization techniques for DL. I've come across projects like DeepVis and ConvNetJS by Karpathy that provide visualization frameworks for DL.
The tutorial describes how and why deep learning methods work as well as they do. In deep learning the traditional hand crafted features are replaced by features which are learnt by the network iteratively. Each subsequent layer of the network captures a higher level of useful features of the image. The tutorial also talks about the different types of learning methods including supervised and unsupervised learning and how the parameters of the hidden layers are learnt using back-propagation.
ReplyDeleteDiscussion:
1. Slide 26, Set 2 Mentions that Back propagation works for any non linear function that is differentiable. But in my understanding the Rectified Linear function is not differentiable at 0.
2. I actually had this question from the project on deep learning we did last Fall in our Computer Vision class. While visualizing the learned filters of the first layer, some of filters corresponded to parallel edges. However we also obtained some filters which seemed to encode some kind of color information. What features do those filters correspond to?
http://www.cc.gatech.edu/~hays/compvision/results/proj6/agupta448/index.html
This is a general introduction to deep learning for recognition systems. It starts with explaining the general pipeline for the conventional old recognition systems; extract set of hand-engineered features then pass them to the trained classifier to make the final decision. Deep learning models learn a more complex, and hierarchal representation of the input data, resulting in a rich intermediate representation of the data. The tutorial presents CNN of the most important deep learning models discuss its architecture.
ReplyDeleteResearch Question:
1. Last year in ICCV, the ultra-deep learning mode, Deep Residual Learning model was introduced. The model consists of 152 layers. The model won all major computer vision challenges last year. My question is: tackling any research problem in computer vision, will the first approach is always: trying deeper and bigger models?
The tutorial is a general overview and an introduction to deep learning in vision. The aim of this tutorial is teach a working knowledge with on hands approach and workable theory. They motivate the tutorial by explaining how hand crafted features like HOG have been outperformed by features learnt through deep learning.
ReplyDeleteThey further go on to explore the deep learning architectures and explain its components in detail. I particularly like the amount of math they put in allowing me to intuitively understand what each component did.
Questions:
When or for which problems would deep learning not work.
If the deep learning based approaches are learning biases, would it make them harder to generalize for various datasets in cases like robotics.
This tutorial serves as an introduction to deep learning systems and their architectures. It talks about how each layer learns their features instead of carefully hand crafting them. It also describes convolutional neural networks and their components like normalization, pooling and the use of non-linear functions.
ReplyDeleteDiscussion:
1) This may have been asked but is there any other benefit of using ReLUs over tanh/sigmoid other than training times?
This tutorial serves as an introduction to deep learning systems and their architectures. It talks about how each layer learns their features instead of carefully hand crafting them. It also describes convolutional neural networks and their components like normalization, pooling and the use of non-linear functions.
ReplyDeleteDiscussion:
1) This may have been asked but is there any other benefit of using ReLUs over tanh/sigmoid other than training times?