A Deep-Learning Based Semi-Interactive Method for Re-colorization Tengfei Zheng PB20000296 July12,2023
A Deep-Learning Based Semi-Interactive Method for Re-colorization Tengfei Zheng PB20000296 July 12, 2023
Contents Contents Abstract 1 Introduction 1 1.1 Colorizing Gravscale Images 1.1.1 Background. 1 11) Image Representation 1 1.2 Two Views of Colorization 2 12 1 From certain color style 2 122 From Given Points 2 Style Transferring Methods 4 2.1 Pixel-wise LUT 4 2.1.1 Description of Content and Style 21 2 Look-Un Table 6 2.1.3 RGB Matching Results 2.1.4 YUV Matching 2.2 Wavelet Methods.. 9 2.2.1 Frequency. 2.2.2 Wavelet Transform 9 2.2.3 Soften the Result 2.3 Edge Detection 12 2.3.1 Convolution Methods 13 2.3.2 Transforming with edge orientation information 14 3 Colorization by Optimizing 15 3.1 Continuity Preserving Methods 15 3.11 Mask 。。。 15 3.12 RGB Optimizing 16 3.1.3 Poisson Results 3.2 Optimizing on YUV Space 19 3.2.1 Loss Function on YUV 3.2.2 YUV Optimization Results 4 Introduce of CNN 22 4.1 Colorization Nets 2 4.1.1 Basic Classification..... 。。。。。。。。。。。。。。。。。。 4.1.2 Plain Network Constructions........................ 4.1.3 Colorize by GAN..·.·.···. 4.2VGG-l9 and Gram Matrix.·········· 2 4.2.1 VGG-19.... 4.2.2 Representation of Metrics ........................ 4.2.3 Result generation.............................. I
Contents Contents I Abstract III 1 Introduction 1 1.1 Colorizing Grayscale Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Image Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Two Views of Colorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 From Certain Color Style . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 From Given Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Style Transferring Methods 4 2.1 Pixel-wise LUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1.1 Description of Content and Style . . . . . . . . . . . . . . . . . . . . . 4 2.1.2 Look-Up Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.3 RGB Matching Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.4 YUV Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Wavelet Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.2 Wavelet Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.3 Soften the Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.1 Convolution Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.3.2 Transforming with Edge Orientation Information . . . . . . . . . . . . 14 3 Colorization by Optimizing 15 3.1 Continuity Preserving Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.1 Mask . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 3.1.2 RGB Optimizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.1.3 Poisson Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Optimizing on YUV Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.1 Loss Function on YUV . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.2 YUV Optimization Results . . . . . . . . . . . . . . . . . . . . . . . . . 20 4 Introduce of CNN 22 4.1 Colorization Nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1.1 Basic Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.1.2 Plain Network Constructions . . . . . . . . . . . . . . . . . . . . . . . . 24 4.1.3 Colorize by GAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2 VGG-19 and Gram Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.1 VGG-19 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 4.2.2 Representation of Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.2.3 Result Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 I
4.3 Implementation of Transferring 4.3.1 Inserting Loss Layers 4.3.2 Some Improvements 29 4.4 Results and Semi-Interactive Methods. 4.4.1 Content Weights...... 4.4.2 Style Weights 4.4.3 Semi-interactive Colorization. 34 5 Conclusion and Discussion 盼 5.1 More Details. 35 5.1.1 Large Datasets and Large Models..................... 5.1.2 Judging Results........................,..,.. 5.2.1 52.2 Future Enhancements.,..,..,·,..,,·,·,········· References 38 Appendix
4.3 Implementation of Transferring . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3.1 Inserting Loss Layers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3.2 Some Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.4 Results and Semi-Interactive Methods . . . . . . . . . . . . . . . . . . . . . . . 31 4.4.1 Content Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4.2 Style Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.4.3 Semi-interactive Colorization . . . . . . . . . . . . . . . . . . . . . . . . 34 5 Conclusion and Discussion 35 5.1 More Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.1.1 Large Datasets and Large Models . . . . . . . . . . . . . . . . . . . . . 35 5.1.2 Judging Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.2.1 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.2.2 Future Enhancements . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 References 38 Appendix 39 II
Abstract Aiming at the problem of colorizing grayscale images.this paper concludes commonly used algorithms about style-transferring and traditional colorization methods,indicating different views of re-colorization:by a similar image or by given colored points,both of which gives rise to an interactive method Ways to solve stvle-transferring problem include pixel-wise color transform,freqt image (source image)we some eparts is already kn vn, optimiza on to fill othe parts of the re by the orinciple that points having shor and bris ld also be oth views nee ed prior info ation abo th of the ecomes a n By intr ained c thi s the limitation while views to a teractiv proac T the method to c situatio fferent col orization nets leading to more appealing res As a conclusion,this paper mpares the features of different views and methods,analyzing how to apply them automatically by modifying network construction
Abstract Aiming at the problem of colorizing grayscale images, this paper concludes commonly used algorithms about style-transferring and traditional colorization methods, indicating different views of re-colorization: by a similar image or by given colored points, both of which gives rise to an interactive method. Ways to solve style-transferring problem include pixel-wise color transform, frequency methods and edge-detecting methods. Given a grayscale image (target image) and an RGB image (source image), we can use these methods to transfer the color onto target image. Otherwise, if the color of the target image in some parts is already known, we can introduce optimization to fill other parts of the image by the simple principle that points having short distance in space and brightness should also be close in color. However, both views need prior information about the image, especially about the semantic information of the image. Convolution Neural Network (CNN) is expertise in identify such information, so using neural networks becomes a natural choice. By introducing pretrained deep learning models, this paper discusses the limitation of conventional methods, while synthesizing both views to a full semi-interactive approach. To extend the method to automatic situations, this paper studies the structure of different colorization nets, and applies some integration and improvements on them, leading to more appealing results. As a conclusion, this paper compares the features of different views and methods, analyzing how to apply them automatically by modifying network construction. III
Course Paper L.-K.Hua Seminar Jmly12,2023 1 Introduction 1.1 Colorizing Grayscale Images 1.1.1 Background aphs were in black and white Co of old hem fo re- colorization restoring As figure 1 she omare original mg,grayscale mage and bya certain agorithm 协办边 Figure 1:Sample re-colorize results Nevertheless,colorization is a challenging problem,for we have to find a fixed work p dure that can cove o and cuds are hite to reonie th hih onditi for instanc Before taking a look at traditional or deep learning methods,we must first describe the in language of math and programming 1.1.2 Image Representation The grayscale image is often represented as a matrix,each pixel of which transformed into an integer value (called intensity)between 0 and 255,meaning the brightness at that point,see figure 2. 107104 9875 intensity Figure 2:Grayscale representation 1
Course Paper L.-K. Hua Seminar July 12, 2023 1 Introduction 1.1 Colorizing Grayscale Images 1.1.1 Background Before color photographic technology became widespread, all photographs were in black and white. Coming to the problem of restoring old photos, how to fill color on them forms an important part. What is more, re-colorization technology can deal with distortion caused by different light conditions, restoring the real color of the image. As figure 1 shows, the three columns are original images, grayscale images and re-colorized images by a certain algorithm. Figure 1: Sample re-colorize results [1] Nevertheless, colorization is a challenging problem, for we have to find a fixed work procedure that can cover varying image conditions. Although scene semantics are usually helpful, for instance, grass is green and clouds are white, to recognize these high level semantics is always a difficult task. Before taking a look at traditional or deep learning methods, we must first describe the problem in language of math and programming: 1.1.2 Image Representation The grayscale image is often represented as a matrix, each pixel of which transformed into an integer value (called intensity) between 0 and 255, meaning the brightness at that point, see figure 2. Figure 2: Grayscale representation [2] 1
Course Paper L.-K.Hua Seminar July12.2023 Higher intensity means brighter,so black pixel has 0 intensity,while white pixel has 255. Pixel-wise operation usually means to do transform on the matrix. Representing color image is a much complicated question.The most simple way to do it is to separate the image into three channels:red(R),green(G),and blue(B).By physics, RGB can form all light colors,so a color image can be represented as the combination of three grayscale images,see figure 3. d in Figure 3:RGB representation tation is intuitive and easy for the computer to shov it doesn't fit O C perception of colors. rsensitivity to lu ce is much str 0g9 /0.299 0.587 0.114 0 =-0.1687-0313 0.5 0.5 -0.4187-0.0813/ Y just means the luminance of the image,or using a simpler word,"lightness".There are different weights to average RGB,resulting in different representation of luminance.U and V are called "chrominance" the color of that pixel,still having the range of [0,255].This representation form is called YUV. By cutting luminance from chrominance,it is easier to dealing with problems relating to grayscale.Linear transforming also makes it faster to restore RGB image.Still,U and V don't conform to out perception,a better but much more complex representation is HSI:Hue, Saturation and Intensity. I=z(R+G+B) s=1-3mim(rc,B R+G+B H=0 B≤G (2) 27-0 B2G 2R-B-G 0=arccos (RG)(R-B) Here the luminance is represented as the arithmetic average of RGB.Unlike RGB or YUV form,HSI representation can't be easily discrete into uint8 data type (integer between 0 and 255). 2
Course Paper L.-K. Hua Seminar July 12, 2023 Higher intensity means brighter, so black pixel has 0 intensity, while white pixel has 255. Pixel-wise operation usually means to do transform on the matrix. Representing color image is a much complicated question. The most simple way to do it is to separate the image into three channels: red(R), green(G), and blue(B). By physics, RGB can form all light colors, so a color image can be represented as the combination of three grayscale images, see figure 3. Figure 3: RGB representation [2] Though RGB representation is intuitive and easy for the computer to show, it doesn’t fit our visual perception of colors. Our sensitivity to luminance is much stronger than sensitivity to color intensity, thus we can do linear transform to RGB space to get luminance: Y U V = 0.299 0.587 0.114 −0.1687 −0.3313 0.5 0.5 −0.4187 −0.0813 R G B + 0 128 128 (1) Y just means the luminance of the image, or using a simpler word, ”lightness”. There are different weights to average RGB, resulting in different representation of luminance. U and V are called ”chrominance”, the color of that pixel, still having the range of [0, 255]. This representation form is called YUV. By cutting luminance from chrominance, it is easier to dealing with problems relating to grayscale. Linear transforming also makes it faster to restore RGB image. Still, U and V don’t conform to out perception, a better but much more complex representation is HSI: Hue, Saturation and Intensity. I = 1 3 (R + G + B) S =1 − 3 min{R, G, B} R + G + B H = ( θ B ≤ G 2π − θ B > G θ = arccos 2R − B − G 2(R − G) 1/2 (R − B) 1/2 (2) Here the luminance is represented as the arithmetic average of RGB. Unlike RGB or YUV form, HSI representation can’t be easily discrete into uint8 data type (integer between 0 and 255). 2
Course Paper L.-K.Hua Seminar Jly12,2023 With YUV or HSI representation,the re-colorize problem can be summarized as following: given matrix Y(I),find the most likely matrix of U and V(S and H).It is obvious that such things can't be done with no prior information,for we are trying to construct three dimension from one.From different information,we can get different views of the problem. 1.2 Two Views of Colorization 1.2.1 From Certain Color Style A common situation is that we have already known the color style of a grayscale image For age of a flow might be olori d to but if we ha which is called to transfer the style.Just as the light condition in the left image is the middle, e know ve can trans Figure 4:Style-transferring Different ways to transfer leads to different effects,but the principle is clear:let S be the source image that contains color style,T be the target grayscale image,style-transferring is a problem like I=argminxfade(X.T)+Bd,(X,S)} Here 酒a 空on content如dwa9accg.小 IYx-Yr, nius and still show ove sage In the section that follows,we will talk about some empirical representation of metric de and d.,with the methods elicited by it.It is worth noting that style-transferring is not necessarily an explicit optimization problem.In many cases,de and d.are independent,thus can be minimized respectively. With style-transferring and an image gallery large enough,we can form a straightforward semi-interactive method:compare the target image with every image in the gallery and find one with the nearest content,then transfer the style to the grayscale.Write in formula,that is to choose s by (9 means the gallery): S=argminxecde(T,X)
Course Paper L.-K. Hua Seminar July 12, 2023 With YUV or HSI representation, the re-colorize problem can be summarized as following: given matrix Y (I), find the most likely matrix of U and V (S and H). It is obvious that such things can’t be done with no prior information, for we are trying to construct three dimension from one. From different information, we can get different views of the problem. 1.2 Two Views of Colorization 1.2.1 From Certain Color Style A common situation is that we have already known the color style of a grayscale image. For example, an image of a flower might be re-colorized to any color, but if we have a color image of the same flower, we can fill the grayscale with similar color, which is called to transfer the color style. Just as figure 4 shows, if we know the light condition in the left image is same as the middle, we can transfer it to the right. Figure 4: Style-transferring Different ways to transfer leads to different effects, but the principle is clear: let S be the source image that contains color style, T be the target grayscale image, style-transferring is a problem like I = argminX{αdc(X, T) + βds(X, S)} Here dc, ds means the distance on content and style, while α, β are weight parameters. A simple thought is to let dc(X, T) = kYX − YT k 2 F , use Frobenius norm to judge the distance, but for ds, to find a proper form is quite uneasy: color style should be independent of location, and still show overall message. In the section that follows, we will talk about some empirical representation of metric dc and ds, with the methods elicited by it. It is worth noting that style-transferring is not necessarily an explicit optimization problem. In many cases, dc and ds are independent, thus can be minimized respectively. With style-transferring and an image gallery large enough, we can form a straightforward semi-interactive method: compare the target image with every image in the gallery and find one with the nearest content, then transfer the style to the grayscale. Write in formula, that is to choose S by (G means the gallery): S = argminX∈Gdc(T, X) Semi-interactive implies that the method doesn’t need to designate a certain image, but still the range of the gallery is important. 3
Course Paper L.-K.Hua Seminar July12.2023 1.2.2 From Given Points Another view comes from the cases when part of the image has already been colored.Use the same image as an example,figure 5 shows a partially colored image with a mask informing where has been colored (the white pixels). Figure 5:Given mask and color is therefore problemof ri the black part in the msk to minimize a function that indicates the conformity of the result.Because the white pixel in the mask is randomly distributed,there can't be an algorithm to get results directly,making the use of optimization essential. One attribution is significantly important in this kind of optimization,the gradient of the image.For space continuity,the matrix can be regarded as uniform sampling of some smooth function,so it is possible to estimate its gradient.Since the step size is 1,define difference operators (here i and j cannot be on the edge): 0=+-- d,ay=t1二a 2 0ay=a+1+a4-1-2ay ⊙ 0ay=4+1+a4j-1-2a Aaij =ai-1j+ai+lj aij-1+aij+1-4aij We can portray local intensity difference by these operators.Because they're all linear, after combining with 2-norm,the optimization problem becomes the least squares problem, which can be precisely solved. 2 Style Transferring Methods 2.1 Pixel-wise LUT 2.1.1 Description of Content and Style Both RGB and YUV have discrete values,so the problem zation problem. 4
Course Paper L.-K. Hua Seminar July 12, 2023 1.2.2 From Given Points Another view comes from the cases when part of the image has already been colored. Use the same image as an example, figure 5 shows a partially colored image with a mask informing where has been colored (the white pixels). Figure 5: Given mask and color Re-colorization is therefore converted to the problem of coloring the black part in the mask to minimize a function that indicates the conformity of the result. Because the white pixel in the mask is randomly distributed, there can’t be an algorithm to get results directly, making the use of optimization essential. One attribution is significantly important in this kind of optimization, the gradient of the image. For space continuity, the matrix can be regarded as uniform sampling of some smooth function, so it is possible to estimate its gradient. Since the step size is 1, define difference operators (here i and j cannot be on the edge): ∂xaij = ai+1,j − ai−1,j 2 ∂yaij = ai,j+1 − ai,j−1 2 ∂ 2 xaij =ai+1,j + ai−1,j − 2aij ∂ 2 y aij =ai,j+1 + ai,j−1 − 2aij ∆aij =ai−1,j + ai+1,j + ai,j−1 + ai,j+1 − 4aij (3) We can portray local intensity difference by these operators. Because they’re all linear, after combining with 2-norm, the optimization problem becomes the least squares problem, which can be precisely solved. 2 Style Transferring Methods 2.1 Pixel-wise LUT 2.1.1 Description of Content and Style Both RGB and YUV have discrete values, so the problem can be regarded as a combinatorial optimization problem. For two pixels (g, h),(i, j), we can define distance content between two 4
Course Paper L.-K.Hua Seminar Jmly12,2023 matrices of the same dimension: dgh(X,Y))= 1(gh-x)(gh-)≤0 and yoh≠ 10 Otherwise. d.(X,Y)=∑dX,y g,h》 This”distan requires two pixels tha to h distance means that we recognize the content by the order of intensity in pixels.So long as we preserve the order,we can see the same contents. For style,the metric is described as(I means characteristic function,which is 1 when I is true,otherwise 0): 1 d(X,Y)=(x)-h(Y) Instead of comparing every point pairs like content,style metric just counts how many pixels of a certain intensity are in the matrix,and compare the intensity distribution.From probability view,here the h is just the distribution function of random sampling in X,called the histogram of a matrix. 2.1.2 Look-Up Table Letting ,B=1,the optimization problem turns to the prerved.For ual pixels in X det ed to find a reed as ause x d y have discrete va this f an also b 1 j=0 fori=0:255 hi1e(g(j+1)<h(i+1)(j<255) j=j+1; end LUT(i+1)=j; end Here g means the histogram of the target image,while h means the histogram of the source image.By such matching,the color style of the source image can be transformed to the target By independently transforming on R,G,B or Y,U,V,we can get three look-up tables From equation(1),the inverter equation is also clear: 自-微m
Course Paper L.-K. Hua Seminar July 12, 2023 matrices of the same dimension: dgh,ij (X, Y ) = ( 1 (xgh − xij )(ygh − yij ) ≤ 0 and ygh 6= yij 0 Otherwise. dc(X, Y ) = X (g,h)(i,j) dgh,ij (X, Y ) This ”distance” is not symmetric, for its null point only requires two pixels that are the same in X to be the same in Y, but not the reverse. In addition to that, it requires X and Y to have the same relationship of partial order. Visually speaking, this distance means that we recognize the content by the order of intensity in pixels. So long as we preserve the order, we can see the same contents. For style, the metric is described as (Il means characteristic function, which is 1 when l is true, otherwise 0): h(X) = (hk(X)), hk(X) = 1 size(X) X ij Ixij≤k ds(X, Y ) = kh(X) − h(Y )k Instead of comparing every point pairs like content, style metric just counts how many pixels of a certain intensity are in the matrix, and compare the intensity distribution. From probability view, here the h is just the distribution function of random sampling in X, called the histogram of a matrix. 2.1.2 Look-Up Table Letting α = ∞, β = 1, the optimization problem turns to the case which the content must be preserved. For equal pixels in X determines equal pixels in Y, we need to find a monotonic increase function f, then yij = f(xij ). Because x and y have discrete values, this f can also be recorded as an array, which is called a look-up table. It is not hard to construct the table by the following code: j = 0 for i = 0:255 while (g(j+1) < h(i+1)) && (j < 255) j = j + 1; end LUT(i+1) = j; end Here g means the histogram of the target image, while h means the histogram of the source image. By such matching, the color style of the source image can be transformed to the target image. By independently transforming on R, G, B or Y, U, V, we can get three look-up tables. From equation(1), the inverter equation is also clear: R G B = 1 0 1.402 1 −0.34414 −0.71414 1 1.772 0 Y U − 128 V − 128 (4) 5
Course Paper L.-K.Hua Seminar July12.2023 2.1.3 RGB Matching Results Matching result by the algorithm upward is like figure 6. Figure 6:Pixel-wise LUT result By checking the histograms on figure 7,we can see how the histogram is transformed from the source image. Figure 7:Histogram of result and source image Though the result see First,transfo color to de the e image ond,it totally ed inf ion abou which is sltoolbrSiarimge.oeheistpob YUV space 6
Course Paper L.-K. Hua Seminar July 12, 2023 2.1.3 RGB Matching Results Matching result by the algorithm upward is like figure 6. Figure 6: Pixel-wise LUT result By checking the histograms on figure 7, we can see how the histogram is transformed from the source image. Figure 7: Histogram of result and source image Though the result seems to color the rising sun right, we can still see two main problems. First, transforming on RGB might cause the sharpness of the color to decrease, and makes the image seem like ”dirty”; second, it totally ignored information about places, which is sometimes useful to color similar images. To solve the first problem, we tried to transform on YUV space. 6