细节懒的说了,(这代码写的真的是稀烂!)
Transform有centercrop(160)>resize()>ToTensor()>Normalize()
优化器G和D用的都是adam
大概就是:
使用数据集里给出真的图片,生成器G用给的随机数生成假的图片, 调教鉴别器D
鉴别器D让生成器G用随机数生成假的图片越来越接近真实图片
鉴别器D
real_Image (BN,3,64,64)
real_images = tensor2var(real_images) # 扔cuda
d_out_real,dr1,dr2 = self.D(real_images)
d_out_real (BN) # 好像是只有BN个数
dr1 (BN,64,64)
dr2 (BN,16,16)
z = tensor2var(torch.randn(real_images.size(0), self.z_dim))
z (BN,128)
fake_images,gf1,gf2 = self.G(z)
fake_images (BN,3,64,64)
gf1 (BN,256,256)
gf2 (BN,1024.1024)
d_out_fake,df1,df2 = self.D(fake_images)
d_out_fake (BN)
df1 (BN,64,64)
df2 (BN,16,16)
d_loss_real = torch.nn.ReLU()(1.0 – d_out_real).mean()
d_loss_fake = torch.nn.ReLU()(1.0 + d_out_fake).mean()
d_loss = d_loss_real + d_loss_fake
self.reset_grad()
d_loss.backward()
self.d_optimizer.step()
生成器G
z = tensor2var(torch.randn(real_images.size(0), self.z_dim))
z (BN,128) #真的就是随机生成
fake_images,_,_ = self.G(z)
fake_images (BN,3,64,64)
g_out_fake,_,_= self.D(fake_images)
g_out_fake (BN)
g_loss_fake = – g_out_fake.mean()
self.reset_grad()
g_loss_fake.backward()
self.g_optimizer.step()
主要还是说一下它这里面的SA机制吧(恼)

大概这个玩意就是分成l1~l4、last、attn1~2个东西
生成器G | 鉴别器D |
(l1): Sequential( (0): SpectralNorm( (module): ConvTranspose2d(128, 512, kernel_size=(4, 4), stride=(1, 1)) ) (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) | (l1): Sequential( (0): SpectralNorm( (module): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) ) (1): LeakyReLU(negative_slope=0.1) ) |
(l2): Sequential( (0): SpectralNorm( (module): ConvTranspose2d(512, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) ) (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) | (l2): Sequential( (0): SpectralNorm( (module): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) ) (1): LeakyReLU(negative_slope=0.1) ) |
(l3): Sequential( (0): SpectralNorm( (module): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) ) (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) | (l3): Sequential( (0): SpectralNorm( (module): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) ) (1): LeakyReLU(negative_slope=0.1) ) |
(l4): Sequential( (0): SpectralNorm( (module): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) ) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): ReLU() ) | (l4): Sequential( (0): SpectralNorm( (module): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) ) (1): LeakyReLU(negative_slope=0.1) ) |
(last): Sequential( (0): ConvTranspose2d(64, 3, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1)) (1): Tanh() ) | (last): Sequential( (0): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1)) ) |
(attn1): Self_Attn( (query_conv): Conv2d(128, 16, kernel_size=(1, 1), stride=(1, 1)) (key_conv): Conv2d(128, 16, kernel_size=(1, 1), stride=(1, 1)) (value_conv): Conv2d(128, 128, kernel_size=(1, 1), stride=(1, 1)) (softmax): Softmax(dim=-1) ) | (attn1): Self_Attn( (query_conv): Conv2d(256, 32, kernel_size=(1, 1), stride=(1, 1)) (key_conv): Conv2d(256, 32, kernel_size=(1, 1), stride=(1, 1)) (value_conv): Conv2d(256, 256, kernel_size=(1, 1), stride=(1, 1)) (softmax): Softmax(dim=-1) ) |
(attn2): Self_Attn( (query_conv): Conv2d(64, 8, kernel_size=(1, 1), stride=(1, 1)) (key_conv): Conv2d(64, 8, kernel_size=(1, 1), stride=(1, 1)) (value_conv): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1)) (softmax): Softmax(dim=-1) ) | (attn2): Self_Attn( (query_conv): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1)) (key_conv): Conv2d(512, 64, kernel_size=(1, 1), stride=(1, 1)) (value_conv): Conv2d(512, 512, kernel_size=(1, 1), stride=(1, 1)) (softmax): Softmax(dim=-1) ) |
传播下来基本一致,都是
l1 > l2 > l3 > attn1 > l4 > attn2 > last
顺带着attn还会把关系矩阵输出出来
然后可以卡到上面这俩网络都引用了SpectralNorm(),这个俗称: 谱归一化,我看他是自己写个函数实现的
SpectralNorm()
在初始化时候
先生成
u =(height) 大小,均值为0 方差为1分布的随机数
v =(widthkernelkernel) 大小,均值为0 方差为1分布的随机数
然后这俩l2正则化
使得u和v现在这俩玩意的2范数为 1
在迭代的时候
weight变成 (H,WKK)的
再转置 (WKK,H) 后和 u(H)做叉乘 再进行正则化 得到新v (WKK, )
weight变成 (H,WKK) 和v(WKK) 做叉乘 再进行正则化 得到新 u(H)
W(H,WKK) * V(WKK) (H)
sigma = V(H) 点乘 W(H,WKK) * V(WKK) (1)
然后 W = W ./ sigma (H,W,K,K)
再把这个输出出去
然后说一下attn的,
算了懒的说了,自己按照代码画完和他们画的图片差不多,就按他们的看吧
具体看
def forward(self, x):
"""
inputs :
x : input feature maps( B X C X W X H)
returns :
out : self attention value + input feature
attention: B X N X N (N is Width*Height)
"""
m_batchsize, C, width, height = x.size()
proj_query = self.query_conv(x).view(m_batchsize, -1, width * height).permute(0, 2, 1) # B X CX(N)
proj_key = self.key_conv(x).view(m_batchsize, -1, width * height) # B X C x (*W*H)
energy = torch.bmm(proj_query, proj_key) # transpose check
attention = self.softmax(energy) # BX (N) X (N)
proj_value = self.value_conv(x).view(m_batchsize, -1, width * height) # B X C X N
out = torch.bmm(proj_value, attention.permute(0, 2, 1))
out = out.view(m_batchsize, C, width, height)
out = self.gamma * out + x
return out, attention
由于只是想把Attn用到cycleGAN上,(反正die码都已经可以跑了嘛,而且我也真的不想继续观摩这个作者的die码(不仅一堆该引的不直接引用,还放一堆没实现或者说它可能根本没写,它自定义的那个谱归一化还污染conv的内置参数(我个人很难接受一个正常卷积里平白无故外挂了一堆weight,关键是这些weight只是按照你需要的分布随机生成的))