An Example of What You’ll Be Making:
TLDR:
I’ll be introducing the tools and principles we’ll be using for this tutorial, outlining the process at a high level, and then doing a more in-depth walkthrough. Lastly, I’ll go through the processing code in a bit more detail to explain how everything works. And if you’re up for it, I’ll have a little quiz / FAQ at the end. But if you just want to get things up and running quickly, just follow the steps in the Process section. You can get the whole thing running in just a few minutes!
But Reading is Hard…
If you prefer to watch rather than read things, you can check out the companion video to this article here:
Tools
StyleGAN
StyleGAN is a generative architecture for Generative Adversarial Networks (GANs). It has become popular for, among other things, its ability to generate endless variations of the human face that are nearly indistinguishable from photographs of real people.
Adopting from the methodologies established in style transfer, where one image is rendered in the style of another (say, your own photo rendered in the style of Picasso), StyleGAN exposes some very important functionality within its model, namely the ability to access its latent space, which is what allows us to create these smooth interpolations (animations) between one image and another.
For more on StyleGAN, check out this video:
RunwayML
RunwayML brings an intuitive visual interface to machine learning, a field which has traditionally been primarily accessible only through complex code, and a requirement to install a large number of dependencies, which creates a big barrier to entry for non-coders. In addition to the interface, Runway provides several crucial capabilities for a project such as this:
- The ability to install a wide variety of ML models with the click of a button. We’ll be using StyleGAN, but in addition to numerous GANs, Runway also offers models handling text-to-image generation models, pose / skeleton tracking models, image recognition and labeling, face detection, image colorization, and more.
- The ability to handle processing quickly on cloud-based architectures.
- The ability to both accept and send data across numerous protocols (OSC / HTTP / File download), meaning you can utilize what you create here in nearly any application (think Processing, MaxMSP, AbletonLive, your browser).
- The ability to output in numerous formats (JSON, CSV, Text, Video, Images).
Recently, Runway introduced the ability to train your own models and add them to Runway, meaning its capabilities are only limited by what you can output.
P5.js
P5.js is the web implementation of the popular creative coding framework, Processing. We’ll be using it to request images from Runway and manipulate that latent space in real-time, in order to generate image “frames” that are visually similar to their predecessors, but pseudo-random. Because of this, the number of landscape images we can create is practically endless.
Process:
- Download, install, and run Runway
- Download P5, P5 Dom, and ToxicLibs
- Create a workspace in Runway running StyleGAN
- In Runway under styleGAN options, click Network, then click “Run Remotely”
- Clone or download this GitHub repo.
- Open the index.html file from the GitHub repo in your browser.
- To output a video from Runway, choose Export > Output > Video and give it a place to save and select your desired frame rate.
- To output a video from Processing, uncomment the //save … line from the imageReady function of stylegan-transitions.js , then turn the images that are output from the browser from an image sequence to a video using a tool like Premiere / AfterEffects, or other free online resource (I’ve noticed this produces a higher-quality video at the moment vs Runway).
Walkthrough
Creating a RunwayML Workspace with StyleGAN
Once you have Runway downloaded, go to the models tab, and add the StyleGAN model to a new workspace. Give it a name, or choose the default.
Running the StyleGAN Model in Runway ML
- For the Z value, to start, choose Vector
- Under options choose Inference
- Under checkpoint choose Landscapes
- Click Run Remotely to start the model so you can select a starting landscape image.
Choosing and Downloading an Initial Landscape
- Make sure the button at the bottom says Stop — this means your model is successfully running.
- Scrub through the image grid until you find a starting landscape image you like.
- To adjust the realism of the images, try adjusting the truncation slider.
- To adjust the randomness of one image to the others, try adjusting the Neighbor Similarity slider.
- Once you select an image thumbnail, click the download button to download the JSON representation of that image to your computer. By default, on a Mac, this should download to your Downloads folder.
Setting up the code repo
- Get the code to generate images by going to this GitHub Repo URL
- Click download zip (or clone the repo if you’re familiar with Github and prefer this). Unzip the download to your folder of choice on your hard drive.
Implementing our Starting Image (Optional)
Why is this step optional? Well, theoretically, if the images you’re requesting from Runway are random enough, it really won’t matter what image you start with — you’ll get a lot of different types of landscapes no matter what. But if you wanted, say, waterfalls, and you made your similarity settings very high, you could theoretically stay within the bounds of only waterfall images.
However — the proper way to ensure you get only a certain type of image would be to train and import your own model. That’s beyond the scope of this tutorial, but you can get a sense of how to do that here.
In fact, if you watch the inspiration for this tutorial — this Coding Train Video on RunwayML by the excellent (Processing Founder) Daniel Shiffman, he is using his own model to generate only images with Rainbows.
- Within that folder, go into the data folder and open up the landscape.js file in a code editor. If you don’t have a code editor, I highly recommend downloading Visual Studio Code for Mac or PC, but you can use Notepad or TextEdit if you have to =)
- Open up the JSON file you downloaded from Runway in that code editor. Select everything, and copy the contents.
- Within the landscape.js file, delete the array (everything after the let a = code), and paste the contents of the JSON file from Runway. You’ll end up with something like:
This represents the z value of your starting image. Change even one number within that great big array by even a little bit, and you’ll get a different image!
Setting up Runway to receive image requests from Processing
We initially chose to manually select an image vector in Runway so we had a starting point. Now we want to programmatically request images from Runway using Processing.
In order to send requests to Runway, we need to make a few changes within the interface:
- Instead of Inference , choose Network .
- Make sure the network is HTTP (the default).
- Note the post route, http://localhost:8000/query
- Make sure your model is running (click Run Remotely).
Requesting images in the browser / Rendering Video
Now we’re finally ready to use the code we downloaded from the GitHub repo to request images from Processing. We’ll go over the code in detail in the next section, but for now, all you need to do is open up the index.html file in a browser window (File > Open or drag the index file into the tab bar of your browser).
After a few seconds, you should see images appearing in the browser window. Now you can go back to Runway and hit that Export button.
You should also see the images cycling through in the Export Preview pane of Runway. When you feel like you have a long enough video, stop the Export in Runway, and stop the model as well to prevent from using further space. You can also close your browser window.
That’s it! Go to where you chose to export your video file, and you should have a QuickTime video of generative animated landscapes!
Code Walkthrough
Here is the relevant code — as you can see, it’s relevantly short and simple. The index.html file simply loads our dependencies (p5 for processing, p5.dom for canvas-related functions and saving, and toxiclibs for randomness, landscape.js for our starting vector).
stylegan-transitions.js is really where all the action is, and I’ll be going through that code, referring to the line numbers of the code as I go along.
<html>
<head>
<script src="js/lib/p5.js"></script>
<script src="js/lib/p5.dom.js"></script>
<script src="js/lib/toxiclibs.js"></script>
<script src="data/landscape.js"></script>
<script src="js/stylegan-transition.js"></script>
</head>
<body>
</body>
</html>
Declare Variables
We start at the top of the file by declaring a bunch of variables (L5–L15).
- outputImage will hold the actual image we get back from Runway.
- n is an array that will hold separate references for 512 NoiseLoop function calls. We want each value in our latent space vector array to be random (but similar). n will help us do that, and we’ll explore how in a minute.
- imgSize is the pixel width and height of the image we’re requesting / creating. StyleGAN wants an image of this size, but certain models are capable of taking images of other sizes.
- We’ll be creating each image pixel-by-pixel. count and angle help us keep track of which pixels we’ve generated to make up our image.
Setup
Our setup function (L15–L22), is automatically called by p5.js. Inside this function, we’re running a loop through each of the 512 pixels that make up our image, and populating that n array with a number value (float).
We get that number by calling the NoiseLoop function. The values inside the parentheses of that function call (NoiseLoop(20,-1,1)) are the function parameters. The 20 is the diameter — it basically gives us a sample range for our random noise function, helping us to make sure that our search area for array values is similar, but not too similar. -1 and 1 are the values we need to keep our number between so that Runway gets the data it needs in order to give us back an image.
Once we have all the pixels, we call our generateImage function in order to get that image.
Generating Randomness
Let’s look at the NoiseLoop function now (L62–L77). It’s a JavaScript class, and every class needs a constructor function that’s similar to p5’s setup function. Here, we’re assigning the parameters we passed in our function call to the class.
cx and cy weren’t passed in the parameters, but we declare them here as a random number between 0 and 1000. These give us a starting value for our SimplexNoise that we’re generating using the toxiclibs library.
value(a) is the function we can call to get a value back from the NoiseLoop class, and what it does is use all those variables from our class constructor, passing them to toxiclib’s SimplexNoise function to get random values. You can see the code for that simplexNoise function here, if you want to dive deeper.
Generating the Image
The generateImage function is where we communicate between p5.js / the browser, and RunwayML. We declare a variable called path , and you’ll recognize its value—localhost:8000/query —as the POST route we saw from the Network tab of RunwayML.
Next we loop through all the pixels of our image, and we get a random value from the noise loop array (n). You might wonder what a is. It’s never declared in this JavaScript file (e.g. let a = []). That’s because we declared it in the data/landscape.js file. Remember, this was the variable we populated with the JSON downloaded from Runway for our original starting image. Here, we’re just replacing those values with new ones so that we can get a new image from Runway.
da is the value by which we increment our angle variable in each pixel, in order to make sure we’re getting unique values. You’ll see in a moment that once our angle reaches TWO_PI , we’ll be ready to generate a new image, because we’ll have all of our 512 pixels.
The data variable is the object we’re going to send to Runway, and it contains the two variables we saw within Runway’s HTTP Input Specification, z and truncation . z is our latent space vector, the thing that gives us the unique image from our machine learning model, and truncation controls the amount of realism / randomness of the image we request. The higher the value (on a 0 to 1 scale), the weirder our images should be.
So now that we have the data Runway needs, we send it:
httpPost is a processing function to make a POST request easily. The path parameter is where we’re posting to, the json string is the data format we’re passing, gotImage is the function within our code we want to call once Runway sends back a successful response, and gotError is what we’ll call if things don’t work out as planned.
We can’t see what Runway is doing behind the scenes, but let’s look at what happens when we get data back from the software.
Receive / Render / Repeat
gotImage takes a parameter, result, which has everything we need from Runway to draw our new random image to the screen. In the output specification section of the Network tab in Runway, you can see what that response is: a base64 image.
What is base64? It’s an encoding type that allows us to represent an image with numbers and letters that’s safe to pass to HTML.
And now that we have it, we simply call processing’s createImage function to assign it to our outputImage variable. That can take a few hundred milliseconds, but when it does that, we’ll call the imageReady function to draw it to the screen.
Inside imageReady , we keep track of whether we’re ready to generate a new image. You’ll see a line there that is commented out (//save(`outputImage${nf(count, 4)}`);). If you uncommented those lines and turned off the export within Runway, you can use Processing to save out each individual image rather than do all that processing in Runway.
I’ve noticed that this gives you images of better quality. But it means you’ll have to turn that big sequence of images into a video on your own (I use Adobe AfterEffects, but there are plenty of free tools online to do this).
Lastly, you’ll see a setTimeout call inside an if conditional. The conditional makes sure we have traversed all of our pixels, and setTimeout just gives us a little bit of a delay to handle all that image processing before we make another generateImage call to repeat the whole process.
And that’s the code! The next section includes a few questions that might have come up as you read. If you have questions on something I didn’t cover, or corrections on anything I missed, let me know. Good luck, and if you make anything cool with this technique, definitely let me know as well.
Quizzes / Questions
Q: What is Z?
A: The Latent Space Vector for your StyleGAN image.
Q: What is the Latent Space?
A: The representation of all the variables that make your image unique from others in the model.
Q: Why bother with Processing? Can’t I do everything in Runway?
A: Processing allows us to request images from the latent space that are a logical progression of each other — random, but not dissimilar from our starting image. In this way, by slightly changing this variable each time we request an image from Runway, we can smoothly animate from one image to another.
Q: How do I know what to send Runway? What are my options?
A: In Runway, under HTTP, you can see an input specification field. This is a JavaScript object, and it’s what you send to Runway. We can see that it expects a variable z , which is an array of 512 floats (decimal numbers), and an optional truncation variable (which, if you remember, controls the realism or randomness of the images.
Q: What’re those other Network options?
A: Protocols!
- Socket.io — If you know (or want to learn) about node.js , socket.io is a great way to implement Web Sockets with a JavaScript web server. It can be an improvement on http because, #1 — it’s allows easier bidirectional communication between the server (Runway) and the client, and #2 — the request stays local, so it’s faster.
- OSC — Strangely, this stands for Open Sound Control. It was a protocol designed for sound (like MIDI), but it can be used to transmit any message. In fact, a lot of very cool applications like Ableton Live, MaxMSP, VDMX, Unity, MadMapper, and more all allow you to interface with OSC. So theoretically, instead of using Processing, you could use Runway to control all this other software.
- JavaScript — It’s a funny place for it within the Runway interface, but clicking on JavaScript just gives you a nice code example of how to make a request to Runway using JavaScript. The code repo you downloaded has its own example.
Q: How much does all this cost me? How does Runway make money?
A: This would be a good time to mention that you shouldn’t keep your model running indefinitely — the generous folks at Runway currently allow you a good portion of cloud processing time (50GB I think), but once that runs out, you need to pay up if you want to keep running models on a cloud server.
The alternative is to run the model on your computer’s own processor — but not every machine learning model allows for this in Runway, and not every computer has a processor capable of doing this efficiently. If you want to do this, make sure your computer has a GPU compatible with Runway and that GPU mode is enabled…Running a model on your CPU can be ~10x slower.
Comments 0 Responses