|
[1]R. Bommasani et al., "On the opportunities and risks of foundation models," arXiv preprint arXiv:2108.07258, 2021. [2]R. OpenAI, "GPT-4 technical report," arXiv, p. 2303.08774, 2023. [3]H. Touvron et al., "Llama: Open and efficient foundation language models," arXiv preprint arXiv:2302.13971, 2023. [4]J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, "Bert: Pre-training of deep bidirectional transformers for language understanding," arXiv preprint arXiv:1810.04805, 2018. [5]X. Jia et al., "Highly scalable deep learning training system with mixed-precision: Training imagenet in four minutes," arXiv preprint arXiv:1807.11205, 2018. [6]H. Mikami, H. Suganuma, Y. Tanaka, and Y. Kageyama, "Imagenet/resnet-50 training in 224 seconds," arXiv preprint arXiv:1811.05233, pp. 770-778, 2018. [7]T. M. Mitchell, "Machine learning," ed, 1997. [8]J. R. Quinlan, "Induction of decision trees," Machine learning, vol. 1, pp. 81-106, 1986. [9]J. R. Quinlan, "Program for machine learning," C4. 5, 1993. [10]C. Cortes and V. Vapnik, "Support-vector networks," Machine learning, vol. 20, pp. 273-297, 1995. [11]J. A. Xu and K. Araki, "A SVM-based personal recommendation system for TV programs," in 2006 12th International Multi-Media Modelling Conference, 2006: IEEE, p. 4 pp. [12]R. Choudhry and K. Garg, "A hybrid machine learning system for stock market forecasting," International Journal of Computer and Information Engineering, vol. 2, no. 3, pp. 689-692, 2008. [13]J. A. Anderson, An introduction to neural networks. MIT press, 1995. [14]B. Mahesh, "Machine learning algorithms-a review," International Journal of Science and Research (IJSR).[Internet], vol. 9, no. 1, pp. 381-386, 2020. [15]Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," nature, vol. 521, no. 7553, pp. 436-444, 2015. [16]Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998. [17]S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997. [18]O. Ronneberger, P. Fischer, and T. Brox, "U-net: Convolutional networks for biomedical image segmentation," in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, 2015: Springer, pp. 234-241. [19]C.-Y. Wang, A. Bochkovskiy, and H.-Y. M. Liao, "YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 7464-7475. [20]W. Yin, K. Kann, M. Yu, and H. Schütze, "Comparative study of CNN and RNN for natural language processing," arXiv preprint arXiv:1702.01923, 2017. [21]D. Weimer, B. Scholz-Reiter, and M. Shpitalni, "Design of deep convolutional neural network architectures for automated feature extraction in industrial inspection," CIRP annals, vol. 65, no. 1, pp. 417-420, 2016. [22]G. Sperlí, "A deep learning based chatbot for cultural heritage," in Proceedings of the 35th Annual ACM Symposium on Applied Computing, 2020, pp. 935-937. [23]A. Brutzkus and A. Globerson, "Why do larger models generalize better? A theoretical perspective via the XOR problem," in International Conference on Machine Learning, 2019: PMLR, pp. 822-830. [24]K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," arXiv preprint arXiv:1409.1556, 2014. [25]K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778. [26]C. Szegedy et al., "Going deeper with convolutions," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1-9. [27]B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le, "Learning transferable architectures for scalable image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 8697-8710. [28]A. Vaswani et al., "Attention is all you need," Advances in neural information processing systems, vol. 30, 2017. [29]A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving language understanding by generative pre-training," 2018. [30]Y. Zhang, A. Warstadt, H.-S. Li, and S. R. Bowman, "When do you need billions of words of pretraining data?," arXiv preprint arXiv:2011.04946, 2020. [31]A. Srivastava et al., "Beyond the imitation game: Quantifying and extrapolating the capabilities of language models," arXiv preprint arXiv:2206.04615, 2022. [32]T. Karras, T. Aila, S. Laine, and J. Lehtinen, "Progressive growing of gans for improved quality, stability, and variation," arXiv preprint arXiv:1710.10196, 2017. [33]R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, "High-resolution image synthesis with latent diffusion models," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10684-10695. [34]D. P. Kingma and M. Welling, "Auto-encoding variational bayes," arXiv preprint arXiv:1312.6114, 2013. [35]K. Frans, L. Soros, and O. Witkowski, "Clipdraw: Exploring text-to-drawing synthesis through language-image encoders," Advances in Neural Information Processing Systems, vol. 35, pp. 5207-5218, 2022. [36]N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman, "Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 22500-22510. [37]E. J. Hu et al., "Lora: Low-rank adaptation of large language models," arXiv preprint arXiv:2106.09685, 2021. [38]Y. Wang, J. Wang, and X. Zhang, "YNU-HPCC at WASSA-2023 Shared Task 1: Large-scale Language Model with LoRA Fine-Tuning for Empathy Detection and Emotion Classification," in Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, & Social Media Analysis, 2023, pp. 526-530. [39]Y. Shi, C. Xue, J. Pan, W. Zhang, V. Y. Tan, and S. Bai, "DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing," arXiv preprint arXiv:2306.14435, 2023. [40]S. Li et al., "Pytorch distributed: Experiences on accelerating data parallel training," arXiv preprint arXiv:2006.15704, 2020. [41]M. Abadi et al., "Tensorflow: Large-scale machine learning on heterogeneous distributed systems," arXiv preprint arXiv:1603.04467, 2016. [42]A. Vishnu, C. Siegel, and J. Daily, "Distributed tensorflow with MPI," arXiv preprint arXiv:1603.02339, 2016. [43]Y. Huang et al., "Gpipe: Efficient training of giant neural networks using pipeline parallelism," Advances in neural information processing systems, vol. 32, 2019. [44]M. Shoeybi, M. Patwary, R. Puri, P. LeGresley, J. Casper, and B. Catanzaro, "Megatron-lm: Training multi-billion parameter language models using model parallelism," arXiv preprint arXiv:1909.08053, 2019. [45]S. Rajbhandari, J. Rasley, O. Ruwase, and Y. He, "Zero: Memory optimizations toward training trillion parameter models," in SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, 2020: IEEE, pp. 1-16. [46]A. Sergeev and M. Del Balso, "Horovod: fast and easy distributed deep learning in TensorFlow," arXiv preprint arXiv:1802.05799, 2018. [47]S. Gan et al., "Bagua: scaling up distributed learning with system relaxations," arXiv preprint arXiv:2107.01499, 2021. [48]NVIDIA. "NVIDA." https://www.nvidia.com/zh-tw/ (accessed August, 2023). [49]Hugging Face. "Hugging Face." https://huggingface.co (accessed August, 2023). [50]M. Li et al., "Scaling distributed machine learning with the parameter server," in 11th USENIX Symposium on operating systems design and implementation (OSDI 14), 2014, pp. 583-598. [51]S. Zhang, A. E. Choromanska, and Y. LeCun, "Deep learning with elastic averaging SGD," Advances in neural information processing systems, vol. 28, 2015. [52]W. Zhang, S. Gupta, X. Lian, and J. Liu, "Staleness-aware async-sgd for distributed deep learning," arXiv preprint arXiv:1511.05950, 2015. [53]P. Patarasuk and X. Yuan, "Bandwidth optimal all-reduce algorithms for clusters of workstations," Journal of Parallel and Distributed Computing, vol. 69, no. 2, pp. 117-124, 2009. [54]"Message Passing Interface (MPI) Forum Home Page." MPI Forum. https://www.mpi-forum.org (accessed August, 2023). [55]NVIDIA. "NVIDIA Collective Communications Library (NCCL)." https://developer.nvidia.com/nccl (accessed August, 2023). [56]G. Wang, S. Venkataraman, A. Phanishayee, N. Devanur, J. Thelin, and I. Stoica, "Blink: Fast and generic collectives for distributed ml," Proceedings of Machine Learning and Systems, vol. 2, pp. 172-186, 2020. [57]R. Alvarez, R. Prabhavalkar, and A. Bakhtin, "On the efficient representation and execution of deep acoustic models," arXiv preprint arXiv:1607.04683, 2016. [58]N. Ström, "Scalable distributed DNN training using commodity GPU cloud computing," 2015. [59]N. Dryden, T. Moon, S. A. Jacobs, and B. Van Essen, "Communication quantization for data-parallel training of deep neural networks," in 2016 2nd Workshop on Machine Learning in HPC Environments (MLHPC), 2016: IEEE, pp. 1-8. [60]A. Krizhevsky. "CIFAR-10 and CIFAR-100 datasets." https://www.cs.toronto.edu/~kriz/cifar.html (accessed August, 2023). [61]P. Goyal et al., "Accurate, large minibatch sgd: Training imagenet in 1 hour," arXiv preprint arXiv:1706.02677, 2017. [62]"train_dreambooth_lora.py failed on two machines." https://github.com/huggingface/diffusers/issues/3363#issuecomment-1537907210 (accessed August, 2023). [63]S. Li et al. "PyTorch Data Parallel Best Practices on Google Cloud." https://medium.com/pytorch/pytorch-data-parallel-best-practices-on-google-cloud-6c8da2be180d (accessed 2023).
|