Deep Learning のトレンドについて喋ってきた

Convolutional Neural NetworksのトレンドについてCasualじゃない話をしてきました．

全脳アーキテクチャ若手の会カジュアルトーク

全脳アーキテクチャ若手の会カジュアルトークというところでお話をしてきました．
ちょっと層がわからなかったのですが，IT系のエンジニアの方が多かったみたいです．
（学生は4人くらい…？しかもほぼ身内）

僕のスライドはSlide Shareの方にアップロードされています．

Convolutional Neural Networks のトレンド @WBAFLカジュアルトーク#2 from Daiki Shimada

しかも，映像もアップロードされていた…

発表後記

実際はConvolutional Neural Networks(CNN)系論文128本ノックにするつもりだったとはいえない空気でしたね…

個人的には画像生成やキャプション生成系の研究速度はとても速く進んでいると感じています．
Visual Turing Test の話はもう少し掘り下げたかったですね．

Deep Mind が DQNで３D一人称視点ゲームを遊ぶ話がちょうど発表直前に出ていて，alphaGoの話題とともに紹介しましたが，あの領域は完全にGoogleの独壇場かなと思います．

本当にここ最近はすごいペースで研究が進んでいくので，どこかで一回整理しないといけないなっていう危機感があり，自分なりにまとめました．
サーベイってけっこう上手くやらないと労力使うし，大事な研究を見落とすこともあるので，こういう資料がだれかのサーベイの種なればいいなあ，なんて．
（各々，独自にまとめていって，こういう資料がどんどん増えることを願う）

紹介文献一覧

CNNアーキテクチャの変遷 / 最適化手法

CNNアーキテクチャ

Neocognitron
- K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics 36, 1980.
LeNet
- Y LeCun, L Bottou, Y Bengio, P Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE 86, 1998.
Ave./Max Pooling, Local Contrast Normalization
- K. Jarrett, K. Kavukcuoglu, M. Ranzato, Y. LeCun. What is the best multi-stage architecture for object recognition?. CVPR, 2009.
ReLU
- X. Glorot, A. Bordes, Y. Bengio. Deep Sparse Rectifier Neural Networks. AISTATS 11, 2011.
Dropout
- G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R. R. Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv: 1207.0580, 2012.
AlexNet
- A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. NIPS, 2012.
Network in Network, global ave. pooling
- M. Lin, Q. Chen, S. Yan. Network In Network. arXiv: 1312.4400, 2013.
VGG-Net
- K. Simonyan, A. Zisserman. Very Deep Convolutional Networks for Large-Scale Visual Recognition. arXiv: 1409.1556, 2014.
GoogLeNet / Inception
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going deeper with convolutions. arXiv: 1409.4842, 2014.
- C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna. Rethinking the Inception Architecture for Computer Vision. arXiv: 1512.00567, 2015.
SPP-Net
- K. He, X. Zhang, S. Ren, J. Sun. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. arXiv: 1406.4729, 2014.
All Convolutional Net, guided BackPropagation
- J. T. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller. Striving for Simplicity: The All Convolutional Net. arXiv: 1412.6806, 2014.
Exemplar CNN
- A. Dosovitskiy, P. Fischer, J. T. Springenberg, M. Riedmiller, T. Brox. Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks. arXiv: 1406.6909, 2014.
Triplet network
- E. Hoffer, N. Ailon. Deep metric learning using Triplet network. arXiv: 1412.6622, 2014.
Batch Normalization
- S. Ioffe, C. Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv: 1502.03167, 2015.
Residual Network; ResNet
- K. He, X. Zhang, S. Ren, J. Sun. Deep Residual Learning for Image Recognition. arXiv: 1512.03385, 2015.

確率的勾配降下法における学習率調整法

AdaGrad
- J. Duchi, E. Hazan, Y. Singer. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research 12 ,2011.
RMSProp
- T. Tieleman, G. Hinton. Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning 4, 2012.
AdaDelta
- M. D. Zeiler. ADADELTA: An Adaptive Learning Rate Method. arXiv: 1212.5701, 2012.
Adam
- D. Kingma, J. Ba. Adam: A Method for Stochastic Optimization. arXiv: 1412.6980, 2014.

特徴量の解析 / 可視化

DeconvNet for visualizing
- M.D. Zeiler, and R. Fergus. Visualizing and understanding convolutional networks. arXiv,: 1311.2901, 2013.
入力画像の最適化
- A. Mahendran, A. Vedaldi. Understanding Deep Image Representations by Inverting Them. arXiv: 1412.0035, 2014.
CNNをだます
- C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, R. Fergus. Intriguing properties of neural networks. arXiv: 1312.6199, 2013.
- I. J. Goodfellow, J. Shlens, C. Szegedy. Explaining and Harnessing Adversarial Examples. arXiv: 1412.6572, 2014.
- A. Nguyen, J. Yosinski, J. Clune. Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images. arXiv: 1412.1897, 2014.

物体検出・領域分割

R-CNN
- R. Girshick, J. Donahue, T. Darrell, J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. arXiv:1311.2524, 2013.
Fast R-CNN
- R. Girshick. Fast R-CNN. arXiv:1504.08083, 2015.
Faster R-CNN
- S. Ren, K. He, R. Girshick, J. Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. arXiv:1506.01497, 2015.
Fully Convolutional Networks (FCN)
- J. Long, E. Shelhamer, T. Darrell. Fully Convolutional Networks for Semantic Segmentation. arXiv: 1411.4038, 2014.
SegNet
- V. Badrinarayanan, A. Handa, R. Cipolla. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Robust Semantic Pixel-Wise Labelling. arXiv: 1505.07293, 2015.
CNN + 条件付き確率場(CRF)
- ] S. Zheng, S. Jayasumana, B. R. Paredes, V. Vineet, Z. Su, D. Du, C. Huang, P. H. S. Torr. Conditional Random Fields as Recurrent Neural Networks. arXiv: 1502.03240, 2015.
Deep Mask
- P. O. Pinheiro, R. Collobert, P. Dollar. Learning to Segment Object Candidates. arXiv: 1506.06204, 2015.
Deep Face
- Y. Taigman, M. Yang, M. A. Ranzato and L. Wolf. DeepFace: Closing the Gap to Human-Level Performance in Face Verification. CVPR, 2014.
Spatial Transformer Networks
- M. Jaderberg, K. Simonyan, A. Zisserman, K. Kavukcuoglu. Spatial Transformer Networks. arXiv: 1506.02025, 2015.

画像生成・超解像

Deep Dream
- Inceptionism: Going Deeper into Neural Networks.
- K. Simonyan, A. Vedaldi, A. Zisserman. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. arXiv: 1312.6034, 2013.
モーフィング
- A. Dosovitskiy, J. T. Springenberg, M. Tatarchenko, T. Brox. Learning to Generate Chairs, Tables and Cars with Convolutional Networks. arXiv: 1411.5928, 2014.
画風変換
- L. A. Gatys, A. S. Ecker, M. Bethge. A Neural Algorithm of Artistic Style. arXiv: 1508.06576, 2015.
- C. Li, M. Wand. Combining Markov Random Fields and Convolutional Neural Networks for Image Synthesis. arXiv:1601.04589, 2016.
画像生成とベクトル演算性
- A. Radford, L. Metz, S. Chintala. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv:1511.06434, 2015.
超解像
- C. Dong, C. C. Loy, K. He, X. Tang. Image Super-Resolution Using Deep Convolutional Networks. arXiv:1501.00092, 2015.
- waifu2x.
Deblurring (モーションブラー除去)
- J. Sun, W. Cao, Z. Xu, J. Ponce. Learning a Convolutional Neural Network for Non-uniform Motion Blur Removal. arXiv:1503.00593, 2015.
自動彩色
- Automatic Colorization.
- B. Hariharan, P. Arbeláez, R. Girshick, J. Malik. Hypercolumns for Object Segmentation and Fine-grained Localization. arXiv: 1411.5752, 2014.

3Dタスクへ

Deep Stereo
- J. Flynn, I. Neulander, J. Philbin, N. Snavely. DeepStereo: Learning to Predict New Views from the World's Imagery. arXiv:1506.06825, 2015.
- DeepStereo: Learning to Predict New Views from the World’s Imagery - YouTube.
ステレオマッチング
- J. Žbontar, Y. LeCun. Stereo Matching by Training a Convolutional Neural Network to Compare Image Patches. arXiv: 1510.05970, 2015.
単一画像による3Dタスク
- D. Eigen, R. Fergus. Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture. arXiv: 1411.4734, 2014.

映像への挑戦

スポーツ映像分類
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, F. Li. Large-scale Video Classification with Convolutional Neural Networks. CVPR, 2014.

より “人間らしい” 機械知覚へ

MemNet: CNN for Memorability
- LaMem
- A. Khosla, A. S. Raju, A. Torralba and A. Oliva. Understanding and Predicting Image Memorability at a Large Scale. ICCV, 2015.

マルチモーダル・アプリケーション

キャプション生成
- O. Vinyals, A. Toshev, S. Bengio, D. Erhan. Show and Tell: A Neural Image Caption Generator. arXiv: 1411.4555, 2014.
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, T. Darrell. Long-term Recurrent Convolutional Networks for Visual Recognition and Description. arXiv: 1411.4389, 2014.
画像に関する質問文に答える
- H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, W. Xu. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering. arXiv: 1505.05612, 2015.
- M. Malinowski, M. Rohrbach, M. Fritz. Ask Your Neurons: A Neural-Based Approach to Answering Questions About Images. ICCV, 2015.
マルチモーダルな情報表現
- R. Kiros, R. Salakhutdinov, R. S. Zemel. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. arXiv: 1411.2539, 2014.

CNNと強化学習

Playing Atari
- V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, M. Riedmiller. Playing Atari with Deep Reinforcement Learning. arXiv:1312.5602, 2013.
- V. Mnih, at al. Human-level control through deep reinforcement learning. nature, 2015.
AlphaGo
- D. Silver, et al. Mastering the game of Go with deep neural networks and tree search. nature, 2016.
Facebookによる囲碁AI
- Y. Tian, Y. Zhu. Better Computer Go Player with Neural Network and Long-term Prediction. arXiv: 1511.06410, 2015.
一人称3Dゲーム
- V. Mnih, A.P. Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, K. Kavukcuoglu. Asynchronous Methods for Deep Reinforcement Learning. arXiv:1602.01783, 2016.

What’s Next ?

Visual Genome
- Visual Genome