臺灣博碩士論文加值系統

English |FB 專頁 |Mobile

免費會員登入| 註冊

功能切換導覽列

(216.73.216.26) 您好！臺灣時間：2025/06/28 04:00

字體大小：

:::

詳目顯示

第 1 筆 / 共 1 筆

/1頁

論文基本資料
摘要
外文摘要
目次
參考文獻
電子全文
QR Code

本論文永久網址:

研究生:

黃郁凱

研究生(外文):

HUANG,YU-KAI

論文名稱:

基於Stable Diffusion使用Realistic Vision和 BLIP 進行圖像生成的視覺語言融合

論文名稱(外文):

Visual Language Fusion for Image Generation using Realistic Vision and BLIP based on Stable Diffusion

指導教授:

黃文楨

指導教授(外文):

HUANG,WEN-CHEN

口試委員:

殷堂凱、陳彥銘、黃文楨

口試委員(外文):

YIN,TANG-KAI、CHEN,YAN-MING、HUANG,WEN-CHEN

口試日期:

2024-01-03

學位類別:

碩士

校院名稱:

國立高雄科技大學

系所名稱:

資訊管理系

學門:

電算機學門

學類:

電算機一般學類

論文種類:

學術論文

論文出版年:

2024

畢業學年度:

112

語文別:

中文

論文頁數:

中文關鍵詞:

機器學習

外文關鍵詞:

Vision-Language fusion、Stable Diffusion、Realistic Vision、BLIP

相關次數:

被引用:0
點閱:475
評分:
下載:83
書目收藏:0

[1] Scott Reed, Zeynep Akata, Xinchen Yan, Lajanugen Logeswaran, Bernt Schiele, Honglak Lee (2016). Generative Adversarial Text to Image Synthesis. Proceedings of The 33rd International Conference on Machine Learning, in Proceedings of Machine Learning Research 48:1060-1069 Available from https://proceedings.mlr.press/v48/reed16.html.
[2] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).
[3]Kingma, D. P., & Welling, M. (2013). Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114.
[4]Radford, A., Metz, L., & Chintala, S. (2015). Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434.
[5]Brock, A., Donahue, J., & Simonyan, K. (2019). Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations.
[6] Brown, T.B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T.J., Child, R., Ramesh, A., Ziegler, D.M., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., & Amodei, D. (2020). Language Models are Few-Shot Learners. ArXiv, abs/2005.14165.
[7]Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L, Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008).
[8]Zhang, H., Goodfellow, I., Metaxas, D., & Odena, A. (2019, May). Self-attention generative adversarial networks. In International conference on machine learning (pp. 7354-7363). PMLR.
[9] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J., & Houlsby, N. (2020). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. ArXiv, abs/2010.11929
[10]Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., & Metaxas, D. N. (2018). StackGAN++: Realistic image synthesis with stacked generative adversarial networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(8), 1947-1962.
[11]Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30.
[12]Chen, X., Duan, Y., Houthooft, R., Schulman, J., Sutskever, I., & Abbeel, P. (2016). Infogan: Interpretable representation learning by information maximizing generative adversarial nets. In Advances in neural information processing systems (pp. 2172-2180).
[13]Zhao, J., Mathieu, M., & LeCun, Y. (2016). Energy-based generative adversarial network. arXiv preprint arXiv:1609.03126.
[14]Denton, E. L., Chintala, S., Szlam, A., & Fergus, R. (2015). Deep generative image models using a Laplacian pyramid of adversarial networks. In Advances in neural information processing systems (pp. 1486-1494).
[15]Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1125-1134).
[16]Kim, T., Cha, M., Kim, H., Lee, J. K., & Kim, J. (2017). Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 1857-1865).
[17]Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training GANs. In Advances in neural information processing systems (pp. 2234-2242).
[18]Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., & Lee, H. (2016). Generative adversarial text to image synthesis. arXiv preprint arXiv:1605.05396.
[19]Zhang, H., Goodfellow, I., Metaxas, D., & Odena, A. (2018). Self-attention generative adversarial networks. arXiv preprint arXiv:1805.08318.
[20]Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223-2232).
[21]Ma, X., Li, Y., Zhang, Y., Zheng, H., & Sun, J. (2018). Shifting more attention to video salient object detection. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 30-46).
[22]Tulyakov, S., Liu, M. Y., Yang, X., Kautz, J., & Catanzaro, B. (2018). MoCoGAN: Decomposing motion and content for video generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1526-1535).
[23]Kim, T., Cha, M., Kim, H., Lee, J. K., & Kim, J. (2017). Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70 (pp. 1857-1865).
[24]Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078. [3]Soumith chintala, Luke metz, & Alec radford. (2015, November 19). Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. https://arxiv.org/abs/1511.06434
[25]Saharia, C., Chan, W., Saxena, S., Li, L., Whang, J., Denton, E.L., Ghasemipour, S.K., Ayan, B.K., Mahdavi, S.S., Lopes, R.G., Salimans, T., Ho, J., Fleet, D.J., & Norouzi, M. (2022). Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. ArXiv, abs/2205.11487.

推文
網路書籤
推薦
評分
引用網址
轉寄

top

相關論文
相關期刊
熱門點閱論文

無相關論文

簡易查詢 | 進階查詢 | 熱門排行 | 我的研究室