Procedural Texture Synthesis Using Stable Diffusion

Procedural Texture Synthesis

Using Stable Diffusion

By Jordy van den Top

Generate a texture using a text prompt. A TEXTure if you may.

Introduction

I am, most likely just like you, a programmer. A programmer with little to no artistic knowledge. So whenever I’m working on a game and i need assets, i will go online to find them. But online you’ll quickly find that you will have to pay for many of them. There are free assets online, but your selection will be limited.

How awesome would it be if you were able to create assets for your game (or other projects) on the fly, without leaving the (game) editor you are currently working in.

Well you came to the correct blog because i created a project just for that!

The project
Back-end
- REST web API
- Stable Diffusion
Front-end
- Unity plugin
Conclusion
- Future Improvements
Sources

The project

First off i will have to tell that this project will (for now) only generate textures (as the title would suggest). While the technology of generating 3D models via text does exist [1]. It reckon it will require extra time to cleanup the generated model because it’s probably not generated very efficiently. I was also limited in time for this project as i had only around of month of time to work on it. This i why i choose to limit the scope to just textures. While i was also planning to add normal maps to the project, due to time constraints i was not able to add this.

Now with that out of the way, let’s talk about what the project currently does!

The project currently consists of a REST web API that can be used to send texture generation requests. For example you can send a request to generate a 512 by 512 resolution texture of a tree bark by just sending “tree bark” to the API. The server will then generate said image in around 10 seconds and send it back to you to use in anything you want, but for a tree bark it’s probably going to be on a tree mesh! And all of this for free! There is practically no limit to what kind of texture you can generate as you can fill in anything your hearth desires. I will go more in depth on the web API here.

Unity logo

To make it easier to work with the API while working in Unity, i made a plugin that does all of the communication with the API automatically. You just have to fill in a text box with the desired texture description and press generate! I will go more in depth on the Unity plugin here.

Now I can see you thinking, how do I generate textures using just text. The answer here is the use of a cutting edge diffusion model that will generate images from noise. In my case i have used Stable Diffusion because it can be run locally for free. I will go more in depth on Stable Diffusion here.

Stable Diffusion generating flower images from noise.

Some examples

Using this project you can generate any kind of (seamless) texture to be used in any kind of game or project, here are a few examples of textures you can expect to generate:

Back-end: REST web API

Because running Stable Diffusion requires a strong GPU and a decent amount of memory i got the recommendation to run all the heavy code on a web server. This way also people with with less powerful hardware are able to use it. All they have to do is send a request to the web server’s URL with in it the requested texture description and option, for example:

https://www.example.com/TextureGenerator?width=512&height=512&prompt=wooden planks texture&tileable=true&steps=30

Generated image for “wooden planks texture”

Here we request a wooden planks texture using the prompt parameter with the size of 512 by 512 pixels using the width and height parameters. We also say we like a texture that can be tilled using the tileable parameter. As last we request the amount of steps Stable Diffusion still take to generate the texture. More steps will increase the amount of detail (up until a point) but will require more time to generate. After about ten seconds the generated texture will be returned to be used in any project.

API Input

Getting a bit more technical about how the API works. I am using a C# ASP.NET core application for my web API. This because i am already well known with C# and have already made a web API before in this language.

Whenever a texture request gets send to the API, it will enter this function:

public async Task<ActionResult> Get(string prompt, bool tileable = true, Enums.Format format = Enums.Format.Jpg, int width = 512, int height = 512, int steps = 20)
{
    if (string.IsNullOrEmpty(prompt))
    {
        return BadRequest("Prompt cannot be empty.");
    }

    if (prompt.Length > MaxPromptLenght)
    {
        return BadRequest($"Prompt cannot be longer than {MaxPromptLenght} characters.");
    }

    width = ValidateResolutionParameter(width);
    height = ValidateResolutionParameter(height);
    steps = Math.Clamp(steps, 1, MaxSteps);

    var imagePath = await StableDiffusionLoop.Generate(prompt, tileable, format, width, height, steps);
    var image = System.IO.File.OpenRead(imagePath);

    return File(image, $"image/{format.ToString().ToLower()}");
}

Here many defaults are already set in the parameters, like the resolution, the file format, whether the texture should be tillable and the amount of steps for example. This way you only have to send a prompt for it to work while still being able to change the optional parameters.

Here i also parse the input to make sure the filled in parameters are in a reasonable range to work with. While the steps use a simple clamp to keep it it withing a specific range. The resolution values need to be in steps of 64. Because of this i have used a Linq query [2] on a set list of resolutions that selects the best possible options whenever the input resolutions is not one of the set steps. In the code that is handled like this:

// fill resolutions array with int values in steps of 64 up to 1024
private static readonly int[] ResolutionOptions = Enumerable.Range(0, 16).Select(i => 64 * (i + 1)).ToArray();

private static int ValidateResolutionParameter(int value)
{
    if (value < ResolutionOptions[0])
    {
        value = ResolutionOptions[0];
    }

    value = ResolutionOptions.Where(i => i <= value).Max();
            
    return value;
}

Here i pass in the input resolution value and checking if it is lower than the lowest allowed resolution, if so we set it to the lowest amount. We then use the Max() function from Linq on all values that are lower than the input value to select the max value from the generated list.

The most important code from the full function is:

var imagePath = await StableDiffusionLoop.Generate(prompt, tileable, format, width, height, steps);

Here i pass all in all the parsed input to the Stable Diffusion backend so it can start generating a texture.

Talking to Stable Diffusion

Stable Diffusion runs in python and not in C# that the API runs in. Because of this i had to run it as a separate program that runs parallel with the API. Whenever the API starts up, we launch the Stable Diffusion program as well using the C# Process class:

public async void InitializeProcess()
{
    _process = new Process
    {
        StartInfo = new ProcessStartInfo
        {
            FileName = "cmd.exe",
            RedirectStandardInput = true,
            UseShellExecute = false,
            RedirectStandardOutput = true,
            WorkingDirectory = Paths.StableDiffusionRepoPath,
            CreateNoWindow = true
        }
    };

    _process.Start();
            
    _streamWriter = _process.StandardInput;
            
    if (_streamWriter.BaseStream.CanWrite)
    {
        // init conda
        await _streamWriter.WriteLineAsync(Paths.CondaActivate);
        await _streamWriter.WriteLineAsync("activate ldm");

        // init stable diffusion
        var initCommand = $"python optimizedSD\\optimized_txt2img_loop.py --outdir \"{Paths.GeneratedImagesFolder}\" --sampler euler_a";
        await _streamWriter.WriteLineAsync(initCommand);
    }
    else
    {
        throw new Exception("Cannot write to input stream.");
    }
}

We start a new command prompt process and get its input stream that we will use to send commands. We then tell it to activate a Conda environment [3], this makes sure we have all the required dependencies that stable diffusion needs. After we are in the Conda environment we start the Stable Diffusion python script and tell it what output folder it should use for the generated textures together with what sampler it should use. The sampler tells stable diffusion how to generate an image. There are multiple samplers available but i found that euler_a works relatively well for texture synthesis.

When we get a request to generate a texture, we will be entering this function:

public async Task<string> Generate(string prompt, bool tileable, Enums.Format format, int width, int height, int steps)
{
    var requestId = Guid.NewGuid().ToString();
        
    _imagePaths.Add(requestId, string.Empty);

    if (_streamWriter.BaseStream.CanWrite)
    {
        Console.WriteLine($"Sending Stable Diffusion request: {requestId} - {prompt}");

        //write prompt to stable diffusion process input
        await _streamWriter.WriteAsync($"{requestId}|{prompt}|{_random.Next(0, 1000000)}|{tileable}|{format.ToString().ToLower()}|{width}|{height}|{steps}");
    }
    else
    {
        throw new Exception("Cannot write to input stream.");
    }

    while (string.IsNullOrEmpty(_imagePaths[requestId]))
    {
        await Task.Delay(100);
    }

    var filePath = _imagePaths[requestId];

    _imagePaths.Remove(requestId);

    Console.WriteLine($"Returning {filePath}");

    return filePath;
}

Since there could be multiple requests at a given moment we generate a unique request id that we later use to identify what texture got generated. We send the prompt together with its request id and settings to the the Stable Diffusion process’s input stream using a pipe string. This mean we place a pipe character “|” in between every value we are sending so that python can easily split the values. This way we can send all the required data te generate a texture at once.

After we send the texture generation request to Stable Diffusion we start waiting te see if the image path for our request id has been filled, this means that Stable Diffusion is done generating our texture. For us to know when a texture is done generating, we need to listen to Stable Diffusion process’s output.

Listening to Stable Diffusion

During the development of this project i ran into an issue where i did not get any output from the standard output stream of the Stable Diffusion process. This was because IIS, the windows web hosting platform [4], was launching the Stable Diffusion process as a different user. Due to security reasons we are not able to listen to output from processes that we did not launch our self.

The solution i came up with for this problem is to make Stable Diffusion send a message to the API to let it know it’s current status.

def printToServer(value):
    try:
        print(f'Print to Server: {value}')

        url = 'https://localhost:44370/TextureGenerator'
        printData = {'Message': value}

        requests.post(url, json = printData)
    except:
        print("Error printing to server")

printToServer(f"Loading model from {ckpt}")

I can call this function from the Stable Diffusion python script whenever i want to send a status update to the web API. In this code snippet I am sending “Loading model from {cktp}” to the API using a https post request to let it know the Stable Diffusion process is currently loading a model with a given name.

Back in the web API I am listing to these updates using a simple post listener:

public ActionResult Post([FromBody] StableDiffusionPrint print)
{
    Console.WriteLine($"[Stable Diffusion] {print.Message}");
    StableDiffusionLoop.Instance.OutputReceived(print.Message);

    return Ok();
}

Here i print all input to the console using the tag “[Stable Diffusion]” to differentiate between the output from Stable Diffusion and the web API itself. I then send the output to the Stable Diffusion class in the web API to handle the messages.

public void OutputReceived(string message)
{
    if (message.StartsWith("Finished|"))
    {
        var messageSplit = message.Split('|');
        var finishedRequestId = messageSplit[1];
        var filePath = messageSplit[2];
                
        Console.WriteLine($"Stable Diffusion finished image request {finishedRequestId}: {filePath}");

        _imagePaths[finishedRequestId] = filePath;
    }
}

The status update starting with “Finished” will tell the web API a requested texture finished generating. I am again using a pipe string to send both the request id and the generated image path in the same string. We then use the request id to know where to put the image path in the paths dictionary. Once this value is set, back in the texture generation script we notice the file path has been set and we use it to grab the generated texture and send it to the person who requested it.

Back-end: Stable Diffusion

All the texture generation is done via Stable Diffusion [5]. This is a state of the art image generation model that uses a neural network to generate an image via noise and an input prompt. There are multiple images generation models available with all there own ups and downs, here are a few options i could choose from:

Model name	Good	Bad
DALL·E 2	Good image quality	Paid Cannot be run locally Closed source
Midjourney	Good image quality	Paid Only as Discord bot Cannot be run locally Closed source
Imagen	Good image quality	Not publicly available (yet)
Stable Diffusion	Good image quality Open source Free Can be run locally

Different image generation models with ups and downs for this project

While all image generation models can produce very impressive images. There are clear benefits with using Stable Diffusion. The biggest of which is that I am able to run it locally for free. With this I am unable to think of any disadvantages of using Stable Diffusion for this project.

Continuous generation

Difference in time taken for image generation before and after source code change

To get Stable Diffusion working well with my project i had to change its source code to allow continuous generation of images. Before i changed the source code it took around two minutes to generate an image. This is because with every texture generation the model had to load about four GB of data from the hard drive before it was able to start generating. After my changes it only took around ten seconds to generate, that is twelve times faster! This is because we don’t need to load the model data every time we want to generate a new image.

while True:
    # wait for input
    printToServer("Awaiting input")
    inputData = ""
    while not inputData:
        inputData = input()

    printToServer("Input received: " + inputData)

To get continuous generation to work i have added an infinite while loop that waits for input from the web API. Once the image is done generating it will come back to the top of the loop to begin waiting for the next request. Before i added the while loop the code would run from top to bottom and then close the program. Clearing all the loaded models and data in the process.

Tileable textures

Games use a lot of tileable textures. But Stable Diffusion does not normally generate these. Luckily there was already a solution to this problem [6]:

def patch_conv(klass, mode):
	init = klass.__init__
	def __init__(self, *args, **kwargs):
		return init(self, *args, **kwargs, padding_mode=mode)
	klass.__init__ = __init__

for klass in [torch.nn.Conv2d, torch.nn.ConvTranspose2d]:
    patch_conv(klass, 'circular')

By patching the neural network and setting its ‘padding_mode’ to ‘circular’ before loading them in, the network will generate seamless textures. You can see the difference this makes here using the same prompt and seed:

Parsing input data

Since every generated texture will be different we will be getting different data from the API with each texture generation request. I explained here how i encode the data that gets send to the python script in to a pipe string. Now it is time to decode this data to be used by Stable Diffusion.

inputData = inputData.split("|")

requestId = inputData[0]
prompt = inputData[1]
seed = int(inputData[2])
tileable = True if inputData[3] == "True" else False
format = inputData[4]
width = int(inputData[5])
height = int(inputData[6])
steps = int(inputData[7])

Decoding the data is as simple as splitting the input string with the pipe character “|”. This returns an array of strings with all our data. We then need to parse them if needed and set them into there own variables to be used later in the Stable Diffusion image generation code.

Fun fact: python converts a non empty string to True when casting it to a boolean. So when you try to parse the string “False“, python will say the value is True, which is very counter intuitive. This is why i had to use some extra code to get boolean parsing to work.

Front-end: Unity plugin

Because the texture generation part of the project runs on a web API, it can be called from anything. This way plugins can be created for any game engine or platform. I chose to make a plugin for the Unity game engine because i am well known with the engine and could fairly easily get the plugin up and running.

The Unity plugin is fairly simple. It contains a text box where you can enter the prompt for the desired texture to be generated. With just this you are able to press the generate button to start generating the described texture. But the plugin also contains a few optional settings that can be toggled on or off. In here you can select the desired resolution of the texture, whether it should be tileable, the image format (png has better quality while jpg is only ten procent the size while still retaining good quality) and the amount of steps Stable Diffusion will use to generate the texture.

After entering a prompt and setting any optional settings you can press generate to send the request to the web API to start generating the texture. After about ten seconds the generated images will be returned and be displayed as a preview. It also saves the image to the Unity project and makes a material from it. If a gameobject was selected, the generated material will be set on this object automatically. This way it is faster to iterate over different textures without having to reassign them every time you generate a new one.

Generated brick wall texture applied to a mesh in Unity.

How it works behind the scenes

The plugin is made as a custom editor window in Unity. For this window to open we use:

[MenuItem("Tools/Text to Texture")]
public static void ShowWindow()
{
    GetWindow(typeof(TextToTextureEditorWindow), false, "Text to Texture");
}

I use the MenuItem class to tell Unity this function need to be bound to a button in the Editor. The plugin button will now be found under the “Tools” menu.

The UI code is done inside of Unity’s OnGUI function using GUILayout class to add UI components:

GUILayout.Label("Text prompt", EditorStyles.boldLabel);
_prompt = EditorGUILayout.TextArea(_prompt, GUILayout.Height(50));

_optionalSettingsShown = EditorGUILayout.Toggle("Optional Settings", _optionalSettingsShown);

if (_optionalSettingsShown)
{
    GUILayout.Label("Texture resolution X", EditorStyles.boldLabel);
    _width = ResolutionSteps * (EditorGUILayout.IntSlider(_width, 64, 1024) / ResolutionSteps);

    GUILayout.Label("Texture resolution Y", EditorStyles.boldLabel);
    _height = ResolutionSteps * (EditorGUILayout.IntSlider(_height, 64, 1024) / ResolutionSteps);

    GUILayout.Label("Tileable", EditorStyles.boldLabel);
    _tileable = EditorGUILayout.Toggle(_tileable);

    GUILayout.Label("Format", EditorStyles.boldLabel);
    _format = (Format)EditorGUILayout.EnumPopup(_format);

    GUILayout.Label("Steps", EditorStyles.boldLabel);
    _steps = EditorGUILayout.IntSlider(_steps, 1, MaxSteps);
}

Here I start by adding a text area for the prompt to be entered into. I also add a toggle to be able to hide the optional settings. When this toggle is off, the only visible UI is the text area and the generate button. Talking about the generate button, it is displayed like so:

if (GUILayout.Button(buttonText, GUILayout.Height(50)))
{
    GetRequest($"https://www.example.com/texturegenerator?prompt={_prompt}&width={_width}&height={_height}&tileable={_tileable}&format={_format.ToString().ToLower()}&steps={_steps}");
}

Whenever the generate button is pressed it will send all the filled in parameters to the web API as a https get request te begin generating a texture.

Once the web API is done generating the image. A preview of the texture will be shown in the editor window like this:

if (_texture != null)
{
    var width = Mathf.Min(position.width - 5, _texture.width);
    var height = _texture.height * width / _texture.width;
            
    GUILayout.Label(_texture, GUILayout.Width(width), GUILayout.Height(height));
}

I am using position.width to get the width of the editor window. I use this to calculate the width and height of the preview image to make it reactive.

Besides only showing a preview, we also generate a material based on the generated texture. This way the texture can be applied to meshes of game objects. This is done like so:

var path = $"{TextToTextureTextureFolder}/{fileNameSafePrompt}_{_width}_{_height}.{_format.ToString().ToLower()}";
await File.WriteAllBytesAsync(path, webRequest.downloadHandler.data);

_texture = AssetDatabase.LoadAssetAtPath<Texture2D>(path);

First we save the received texture to the assets folder of the unity project. We then load the image as an asset to be used when creating a material with it which is done like this:

var material = new Material(Shader.Find("Universal Render Pipeline/Lit"))
{
    mainTexture = _texture
};

path = $"{TextToTextureMaterialsFolder}/{fileNameSafePrompt}.mat";
AssetDatabase.CreateAsset(material, path);

Creating a material using the generated texture is pretty straight forward. We first create a new material using the default Lit shader and then set the main texture of the material to the texture we generated. A path is generated with the help of the used prompt and the material is saved to the assets folder. After we have created the material we can apply it to a game object. Whenever an object is selected, this will automatically happen with the help of the following code:

if (Selection.activeGameObject != null)
{
    if (Selection.activeGameObject.TryGetComponent(out Renderer renderer))
    {
        renderer.material = material;
    }
}

We simply check if an object is currently selected, if so, we check if it has a render component attached to it. If this is the case we will use this renderer to update the material with our newly generated material.

Research Method

While i already knew the basics of crating a web API and Unity plugin, i did got stuck a few times. During problem solving i have been relying on deskresearch to find de answers to the problems i was facing.

Besides deskresearch i also had many peer reviews where i asked for feedback on the project. Because of received feedback i got about Stable Diffusion being hard to run, i moved all of the image generation logic to a web API.

Received feedback from lecturer to make use of a web API (28-11-2022)

Conclusion

Being able to create (seamless) textures on the fly without the need for artistic skills is really going to help developers realize their ideas and increase the speed of prototyping virtual environments. The heavy texture generation code being hosted via a web API will allow plugins to be made with any platform, from game engines like Unity and Unreal to, but not limited to, video editing programs like Adobe Premiere Pro and even mobile apps.

While I was not able to add everything I wanted to the project. I am still very happy how it ended up. I will keep expanding and improving the project even after the project is now over because i think it has big potential for developers, or at least for myself while creating games.

Future Improvements

A big feature that is currently missing is the synthesis of normal maps based on the generated textures. This could be done using a neural network like Pix2Pix.

Sources

[1] Poole, B. et al. (2022) DreamFusion: Text-to-3D using 2D Diffusion, GitHub. Available at: https://dreamfusion3d.github.io/ (Accessed: November 2022).

[2] Wagner, B. (2022) Language-integrated query (LINQ) (C#), Language-Integrated Query (LINQ) (C#) | Microsoft Learn. Available at: https://learn.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/linq/ (Accessed: November 2022).

[3] (2017) Conda. Available at: https://docs.conda.io/projects/conda/en/latest/ (Accessed: November 2022).

[4] A flexible & easy-to-manage web server (2022) Home : The Official Microsoft IIS Site. Available at: https://www.iis.net/ (Accessed: November 2022).

[5] Mostaque, E. (2022) Stable diffusion public release, Stability.Ai. Stability.Ai. Available at: https://stability.ai/blog/stable-diffusion-public-release (Accessed: November 2022).

[6] Sygil-Dev (2022) Tileable texture from stable diffusion · discussion #224 · Sygil-dev/Sygil-webui, GitHub. Available at: https://github.com/Sygil-Dev/sygil-webui/discussions/224#discussioncomment-3494819 (Accessed: November 2022).