当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

广东工业大学:《机器学习》课程教学资源(课件讲义)第22讲 生成式网络模型(Stable Diffusion)

资源类别:文库,文档格式:PDF,文档页数:18,文件大小:1.33MB,团购合买
点击下载完整版文档(PDF)

Stable Diffusion

Stable Diffusion

A cat in Text-to-image Framework the snow Generator A cat in Text the snow Encoder Generation Model “中間產物” Decoder 圖片的壓縮版本 3

Framework Text-to-image Generator A cat in the snow A cat in the snow Text Encoder Generation Model Decoder 1 3 2 “中間產物” 圖片的壓縮版本

Stable Diffusion https://arxiv.org/abs/2112.10752 Latent Space 2 Conditioning Diffusion Process Semantid Map 2 Denoising U-Net EA Text x(T-1) Repres entations Images Pixel Space D可 品 ☑ T denoising step crossattention switch skip connection concat

Stable Diffusion https://arxiv.org/abs/2112.10752 1 2 3

DALL-E series https://arxiv.org/abs/2204.06125 https://arxiv.org/abs/2102.12092 CLIP objective img encoder “a corgi playing a flame text 3 a80 throwing encoder trumpet" Autoregressive Diffusion prior decoder :

DALL-E series https://arxiv.org/abs/2204.06125 1 2 3 https://arxiv.org/abs/2102.12092 Autoregressive Diffusion

Text "A Golden Retriever dog wearing a blue checkered beret and red dotted turtleneck." Imagen Frozen Text Encoder https://imagen.research.google/ https://arxiv.org/abs/2205.11487 Text Embedding Text-to-Image Diffusion Model 2 64×64 Image f2 Super-Resolution Diffusion Model 256×2561ma 3 Super-Resolution Diffusion Model 1024×10241mag9

Imagen https://imagen.research.google/ https://arxiv.org/abs/2205.11487 1 2 3

A cat in Text-to-image Framework the snow Generator A cat in Text the snow Encoder Generation Model Decoder 3

Framework Text-to-image Generator A cat in the snow A cat in the snow Text Encoder Generation Model Decoder 1 3 2

T5-Small 300M 25 T-Large 25 500M T5-XL 1B T5-XXL 2B XOI-CIH 20 XOI-CI 20 15 15 10 10 0.22 0.24 0.26 0.28 0.24 0.250.26 0.270.280.29 CLIP Score CLIP Score (a)Impact of encoder size. (b)Impact of U-Net size. https://arxiv.org/abs/2205.11487

https://arxiv.org/abs/2205.11487

Frechet Inception Distance (FID) https://arxiv.org/abs/1706.08500 red points:real images CNN softmax blue points:generated images FID Frechet distance ?? between the two Gaussians Smaller is better A lot of samples is needed

Fréchet Inception Distance (FID) red points: real images FID = Fréchet distance between the two Gaussians CNN softmax blue points: generated images ??? Smaller is better A lot of samples is needed. https://arxiv.org/abs/1706.08500

Contrastive Language-Image Pre-Training (CLIP) https://arxiv.org/abs/2103.00020 400 million image-text pairs close far Text Image Text Image Encoder Encoder Encoder Encoder A cat in A dog is the snow running

Contrastive Language-Image Pre-Training (CLIP) https://arxiv.org/abs/2103.00020 Text Encoder Image Encoder A cat in the snow Text Encoder Image Encoder A dog is running. 400 million image-text pairs close far

A cat in Text-to-image Framework the snow Generator A cat in Text the snow Encoder Generation Model Decoder can be trained Decoder without labelled data

Framework Text-to-image Generator A cat in the snow A cat in the snow Text Encoder Generation Model Decoder 1 2 3 Decoder can be trained without labelled data

点击下载完整版文档(PDF)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
注册用户24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共18页,试读已结束,阅读完整版请下载
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有