|
[1]S. Ji, W. Xu, M. Yang, and K. Yu, “3D Convolutional Neural Networks for Human Action Recognition,” IEEE Trans. on Pattern Analysis and Machin Intelligence (PAMI), Vol. 35, No, 1, pp. 232-231, 2013. [2]I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning Realistic Human Actions from Movies,” Computer Vision and Pattern Recognition, 2008. [3]J.Y.H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, ”Beyond Short Snippets: Deep Networks for Video Classification,” CVPR, 2015. [4]K. Simonyan and A. Zisserman, “Two-stream Convolutional Networks for Action Recognition in Videos,” Neural Information Processing Systems, pages 568–576, 2014. [5]H. Wang and C. Schmid, “Action Recognition with Improved Trajectories,” International Conference on Computer Vision (ICCV), 2013. [6]Z. Jingjing, Z. Jiang, and R. Chellappa, “Cross-View Action Recognition via Transferable Dictionary Learning,” IEEE Transactions on Image Processing, 25(6):2542–2556, 2016. [7]J. Donahue, L.A. Hendricks, M. Rohrbach, S. Venugopalan, S. Guadarrama, K.Saenko, and T. Darrell, “Long-term recurrent convolutional networks for visual recognition and description,” IEEE Trans.on PAMI, 2017. [8]J. Carreira and A. Zisserman, “Quo vadis, action recognition? A new model and the kinetics dataset,” CVPR, pp. 4724-4733, 2017. [9]Y.H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G.Toderici, “Beyond short snippets: deep networks for video classification,”CVPR, 2015. [10]Z. Qi u, T. Yao, and T. Mei, “Learning spatio temporal representation with pseudo-3d residual networks,” ICCV, pp. 5534-5542, 2017. [11]J. Liu, A. Shahroudy, D. Xu, and G. Wang, “Spatio-temporal LSTM with trust gates for 3D Human action recognition,” ECCV, pp. 816-833, 2016. [12]A. Diba, M. Fayyaz, V. Sharma, A.H. Karami, M.M. Arzani, R. Yousefzadeh, and L. VanGool, “Temporal 3D convNets: new architec ture and transfer learning for video classification,” arXiv:1711.08200 , 2017. [13]C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional two stream network fusion for video action recognition,” CVPR, 2016. [14]L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D . Lin, X. and L.V. Gool, “Temporal segment networks: towards good practices for deep action recognition,” ECCV,pp. 20-36, 2016. [15]H. Wang and C. Schmid, ”Action recognition with improved trajectories,” ICCV,2013. [16]I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld, “Learning realistic human actions from movies,” CVPR, pp. 1-8, 2008. [17]J. Zheng, Z. Jiang, and R. Chellappa, “Cross-view action recognition via transferable dicti onary learning,” IEEE Trans. on IP , Vol. 25, No. 6, pp.2542-2556, 2016. [18]P. Weinzaepfel , Z. Harchaoui, and C. Schmid, “Learning to track for spatio temporal action localization,” ICCV, 2015. [19]G. Yu and J. Yuan, “Fast action proposals for human action detection and search,” CVPR, 2015. [20]A. Gaidon, Z. Harchaoui, and C. Schmid, “Temporal localization of actions with actoms,” IEEE Trans. on PAMI, Vol. 35, No. 11, pp. 2782-2795, 2013. [21]Z. Shu, K. Yun, and D. Samaras, “Action detection with improved dense trajectories and sliding window,” ECCV, pp. 541-551, 2014. [22]S. Karaman, L. Sei denari, and A.D. Bimbo, “Fast saliency based pooling of Fisher encoded dense trajectories,” ECCV THUMOS Workshop, 2014. [23]D. Oneata, J. Verbeek, and C. Schmid, “The LEAR submission at Thumos 2014,” ECCV THUMOS Workshop, 2014. [24]L. Wang, Y. Yu Qiao, and X. Tang, “Action recognition and detection by combining motion and appearance features,” ECCV THUMOS Workshop, vol. 1, 2014. [25]R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” CVPR, 2014. [26]S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real time object detection with region proposal networks” NIPS, pp. 91-99, 2015. [27]R. Hou, C. Chen, and M. Shah, “Tube convolutional neural network (T-CNN)for action detection in videos,” ICCV, 2017. [28]V. Escorcia, F. Heilbron , J. Niebles, and B. Ghanem, “DAPs: deep action Proposals for Action Understanding,” ECCV, pp. 768-784 , 2016. [29]A. Montes, A. Salvador, and X. Nieto, “Temporal activity detection in untrimmed videos with recurrent neural networks,” arXiv:1608.08128, 2016 [30]B. Singh, T. Marks, M. Jones, O. Tuzel, and M. Shao, “A Multi stream bidirectional recurrent neural network for fine-grained action detection,” CVPR, 2016. [31]S. Yeung, O. Russakovsky , G. Mori, and L. FeiFei, “End-to-end learning of action detection from frame glimpses in videos,” CVPR, pp. 2678-2687, 2016. [32]S. Ma, L. Sigal, and S. Sclaroff, “Learning activity progression in LSTMs for activity detection and early detection,” CVPR , pp. 1942-1950, 2016. [33]Z. Shou, J. Chan, A . Zareian, K. Miyazaway, and F. Chang, “CDC: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos,” CVPR, 2017. [34]Y. Zhao, Y. Xiong, L. Wang, Z. Wu, X. Tang, and D. Lin, “Temporal action detection with structured segment networks,” ICCV, 2017. [35]F. Heilbron, V. Escorcia, B. Ghanem, and J. Niebles, “ActivityNet: A large-scale video benchmark for human activity understanding,” CVPR, 2015. [36]He, Kaiming, et al. "Deep residual learning for image recognition," CVPR, 2016. [37]Xu, Huijuan, Abir Das, and Kate Saenko. "R-c3d: Region convolutional 3d network for temporal activity detection," ICCV, 2017. [38]Redmon, Joseph, and Ali Farhadi. "Yolov3: An incremental improvement," arXiv preprint arXiv:1804.02767 (2018). [39]Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision," CVPR, 2016. [40]Kingma, P. Diederik, and B. Jimmy. "Adam: A method for stochastic optimization," arXiv preprint arXiv:1412.6980 (2014). [41]Pytorch官網 , https://pytorch.org/ [42]YouTube首頁 , https://www.youtube.com/
|