Implementation of Neural Style Transfer using Deep Learning

Neural-Style-Transfer is the way toward making another picture by combining two pictures. Since we like the craftsmanship on the base picture, we might want to move that style into our own memory photographs. Obviously, we would want to spare the photograph's substance however much as could reasonably be expected and in the meantime change it as indicated by the workmanship picture style. As an experiential AI development company , Oodles AI elaborates on the process of neural style transfer using deep learning algorithms.

Step 1: Capture

We have to figure out how to catch substance and style picture includes so we can combine them such that the yield will look tasteful to the eye. Convolution neural systems like VGG-16 are as of now, as it were, catching these highlights, because of the way that they can order/perceive an extensive assortment of pictures (millions) with very high precision. We simply need to look further at neural layers and comprehend what they are doing.

Step2: Layer

While preparing with pictures, we should assume we pick the primary layer and begin checking a portion of its units/neurons. Since we are simply in the primary layer, the units catch just a little piece of the pictures and rather low-level highlights, as demonstrated as follows:

It would seem that the principal neuron is keen on askew lines, with the third and fourth in vertical and corner to corner lines, and the eighth for beyond any doubt enjoys the shading green. Is detectable that all these are tiny parts of pictures and the layer is somewhat catching low-level highlights.

We should move somewhat more profound and pick the two layers:

In this layer, neurons begin to recognize more highlights; the second distinguishes thin vertical lines, the 6th and seventh begin catching round shapes, and the fourteenth is fixated on shading yellow.

This layer begins to recognize all the more intriguing stuff, the 6th is more initiated for round shapes that resemble tires, the tenth isn't anything but difficult to clarify yet prefers orange and round shapes, while the eleventh begins identifying a few people.

So the more profound we go, the more picture neurons are distinguishing, hence catching abnormal state includes (the second neuron on the fifth layer is truly into mutts) of the picture contrasted with low-level layers catching rather little parts of the picture.

This gives incredible knowledge into what profound convolutional layers are learning and furthermore, returning to our style exchange, we have an understanding into how to produce craftsmanship and keep the substance from two pictures.

Step 3: Implement

We simply need to create another picture that, when encouraged to neural systems as information, produces pretty much a similar enactment esteems as the substance (photograph) and style (workmanship painting) picture.

Optimize.py

from __future__ import print_function import functools import vgg, pdb, time import tensorflow as tf, numpy as np, os import transform from utils import get_img STYLE_LAYERS = ('relu1_1', 'relu2_1', 'relu3_1', 'relu4_1', 'relu5_1') CONTENT_LAYER = 'relu4_2' DEVICES = 'CUDA_VISIBLE_DEVICES' # np arr, np arr def optimize(content_targets, style_target, content_weight, style_weight, tv_weight, vgg_path, epochs=2, print_iterations=1000, batch_size=4, save_path='saver/fns.ckpt', slow=False, learning_rate=1e-3, debug=False): if slow: batch_size = 1s mod = len(content_targets) % batch_size if mod > 0: print("Train set has been trimmed slightly..") content_targets = content_targets[:-mod] style_features = {} batch_shape = (batch_size,256,256,3) style_shape = (1,) + style_target.shape print(style_shape) # precompute style features with tf.Graph().as_default(), tf.device('/cpu:0'), tf.Session() as sess: style_image = tf.placeholder(tf.float32, shape=style_shape, name='style_image') style_image_pre = vgg.preprocess(style_image) net = vgg.net(vgg_path, style_image_pre) style_pre = np.array([style_target]) for layer in STYLE_LAYERS: features = net[layer].eval(feed_dict={style_image:style_pre}) features = np.reshape(features, (-1, features.shape[3])) gram = np.matmul(features.T, features) / features.size style_features[layer] = gram with tf.Graph().as_default(), tf.Session() as sess: X_content = tf.placeholder(tf.float32, shape=batch_shape, name="X_content") X_pre = vgg.preprocess(X_content) # precompute content features content_features = {} content_net = vgg.net(vgg_path, X_pre) content_features[CONTENT_LAYER] = content_net[CONTENT_LAYER] if slow: preds = tf.Variable( tf.random_normal(X_content.get_shape()) * 0.256 ) preds_pre = preds else: preds = transform.net(X_content/255.0) preds_pre = vgg.preprocess(preds) net = vgg.net(vgg_path, preds_pre) content_size = _tensor_size(content_features[CONTENT_LAYER])*batch_size assert _tensor_size(content_features[CONTENT_LAYER]) == _tensor_size(net[CONTENT_LAYER]) content_loss = content_weight * (2 * tf.nn.l2_loss( net[CONTENT_LAYER] - content_features[CONTENT_LAYER]) / content_size ) style_losses = [] for style_layer in STYLE_LAYERS: layer = net[style_layer] bsstyle, height, width, filters = map(lambda i:i.value,layer.get_shape()) size = height * width * filters feats = tf.reshape(layer, (bsstyle, height * width, filters)) feats_T = tf.transpose(feats, perm=[0,2,1]) grams = tf.matmul(feats_T, feats) / size style_gram = style_features[style_layer] style_losses.append(2 * tf.nn.l2_loss(grams - style_gram)/style_gram.size) style_loss = style_weight * functools.reduce(tf.add, style_losses) / batch_size # total variation denoising tv_y_size = _tensor_size(preds[:,1:,:,:]) tv_x_size = _tensor_size(preds[:,:,1:,:]) y_tv = tf.nn.l2_loss(preds[:,1:,:,:] - preds[:,:batch_shape[1]-1,:,:]) x_tv = tf.nn.l2_loss(preds[:,:,1:,:] - preds[:,:,:batch_shape[2]-1,:]) tv_loss = tv_weight*2*(x_tv/tv_x_size + y_tv/tv_y_size)/batch_size loss = content_loss + style_loss + tv_loss # overall loss train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss) sess.run(tf.global_variables_initializer()) import random uid = random.randint(1, 100) print("UID: %s" % uid) for epoch in range(epochs): num_examples = len(content_targets) iterations = 0 while iterations * batch_size < num_examples: start_time = time.time() curr = iterations * batch_size step = curr + batch_size X_batch = np.zeros(batch_shape, dtype=np.float32) for j, img_p in enumerate(content_targets[curr:step]): X_batch[j] = get_img(img_p, (256,256,3)).astype(np.float32) iterations += 1 assert X_batch.shape[0] == batch_size feed_dict = { X_content:X_batch } train_step.run(feed_dict=feed_dict) end_time = time.time() delta_time = end_time - start_time if debug: print("UID: %s, batch time: %s" % (uid, delta_time)) is_print_iter = int(iterations) % print_iterations == 0 if slow: is_print_iter = epoch % print_iterations == 0 is_last = epoch == epochs - 1 and iterations * batch_size >= num_examples should_print = is_print_iter or is_last if should_print: to_get = [style_loss, content_loss, tv_loss, loss, preds] test_feed_dict = { X_content:X_batch }


0 Comments

Curated for You

Popular

Top Contributors more

Latest blog