|
⦁G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg, “Baby talk: Understanding and generating image descriptions,” CVPR, 2011. ⦁O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. “Show and tell: A neural image caption generator,” CVPR, pp.3156-3164, 2015. ⦁A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth, “Every picture tells a story: Generating sentences from images,” ECCV, pages 15-29, 2010. ⦁I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” NIPS, pages 3104-3112, 2014. ⦁D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” ICLR, 2014. ⦁K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, ¨ F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation,” EMNLP, 2014. ⦁A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” NIPS, pages 1097-1105, 2012. ⦁K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” CVPR, pages 770-778, 2016. ⦁J. Mao, W. Xu, Y. Yang, J. Wang, and A. Yuille, “Deep captioning with multimodal recurrent neural networks,” arXiv:1412.6632, 2014. ⦁A. Karpathy and L. Fei-Fei, “Deep visual-semantic alignments for generating image descriptions,” CVPR, pp. 3128-3137, 2015. ⦁J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, “Long-term recurrent convolutional networks for visual recognition and description.” CVPR, pp. 2626-2634, 2015. ⦁K. Xu, J. Ba, R. Kiros, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio, “Show, attend and tell: Neural image caption generation with visual attention,” PMRL, pp. 2048-2057, 2015. ⦁J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille, “Learning like a child: Fast novel visual concept learning from sentence descriptions of images,” ICCV, 2015. ⦁K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” ICLR, 2015. ⦁S. Hochreiter and J. Schmidhuber. “Long short-term memory,” Neural Computation, Vol. 9, No.8, pp. 1735-1780, 1997. ⦁J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. FeiFei, “ImageNet: A large-scale hierarchical image database,” CVPR, pp. 248-255, 2009. ⦁L. Chen, H. Zhang, J. Xiao, L. Nie, J. Shao, W. Liu, and T. Chua, “SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning,” CVPR, pp. 5659-5667, 2017. ⦁J. Lu, C. Xiong, D. Parikh, and R. Socher, “Knowing when to look: Adaptive attention via a visual sentinel for image captioning,” CVPR, pp. 375-383, 2017. ⦁P. Anderson, X. He, C. Buehler, D. Teney, M. Johnson, S. Gould, and L. Zhang, “Bottom-up and top-down attention for image captioning and visual question answering,” CVPR, pp. 6077-6086, 2018. ⦁A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” NIPS, pp. 5998-6008, 2017. ⦁T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” ECCV, pp. 740-755, 2014. ⦁P. Kuznetsova, V. Ordonez, A. Berg, T. Berg, & Y. Choi, “Collective generation of natural image descriptions.” ACL, pp. 359-368, 2012. ⦁S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, K. Saenko, “Translating videos to natural language using deep recurrent neural networks,” arXiv:1412.4729, 2014. ⦁Day 14:循環神經網路(Recurrent Neural Network, RNN)。檢自: https://ithelp.ithome.com.tw/articles/10193469 ⦁Understanding LSTM Networks.檢自: https://colah.github.io/posts/2015-08-Understanding-LSTMs/
|