Self-Hosted AI Consistent Characters: Part 1 – Let's Build Robotics with Shadow

Good Morning from my Robotics Lab! This is Shadow_8472, and today I am on a quest to generate consistent characters using AI. Let’s get started!

It all started with wanting to learn how to make my own “consistent characters” I can summon with a keyword in my prompt. Before I can train the AI to make one, my subject needs consistent source pictures. One way to do that is to chop up a character sheet with multiple angles of the same character all generated at once. Expect all that in a future post. It sounded like a reasonable goal until I discovered just how many moving parts I needed just to approach.

In particular, my first goal is Ms. T, a Sonic OC (Original Character) by my sister, but once I figure out a successful workflow, it shouldn’t be too hard to make more.

A1111

Automatic1111 (A1111) is the go-to StableDiffusion (SD) image generation web interface for the vast majority of tutorials out there. While it’s not the easiest SD WebUI, A1111 is approachable by patient AI noobs and EasyDiffusion graduates alike. It exposes a bit many controls by default, but it packs enough power to keep a SD adept busy for a while. I also found Forge, an A1111 fork reportedly having extra features, bug fixes, grudging Linux support, and needs a virtual environment. At the top, I found ComfyUI, which lets you design and share custom workflows.

As a warm up exercise, I found a SonicDiffusion, a SD model geared for Sonic characters, generated a bunch of Ms. T portraits, and saved my favorites. Talking with my sister, I began cherry picking for details the model doesn’t control for, such as “cyclopse” designs where the eyes join at the whites vs. separate eyes (Hedgehogs are usually cyclopses, but not in the live action movies). SonicDiffusion –to my knowledge– lacks a keyword to force this distinction. Eventually, my expectations outpaced my ability to prompt, and I had to move on.

ControlNet

A major contribution to A1111’s versatility is its ecosystem of extensions. Of interest this week is ControlNet, a tool to include visual data in a StableDiffusion prompt for precision results. As of writing, I’m looking at 21 controller types – each needing a model to work. I downloaded the ones for Canny, Depth, and OpenPose to get started.

My first thought was to use an Xbox One Kinect (AKA Kinect v2) I bought from someone in my area a few Thanksgivings ago. If it works, I can easily pose for ControlNet. Long story short: I spent a couple days either last week or the week before tossing code back and forth with a self-hosted AI chatbot in SillyTavern with no dice. The open source Linux driver for the Kinect v2 just isn’t maintained for Ubuntu 22.04 and distros built on it. I couldn’t even get it to turn on its infrared LED’s (visible by my PinePhone’s camera) because of broken linkages in the header files or something. Pro tip: Don’t argue with a delusional LLM unless you can straighten it out in a reply or two. On the plus side, the AI did help me approach this job where I’d expect to have taken weeks to months without it. If/when I return, I expect to bodge it with Podman, but I may need to update the driver anyway if the kernel matters.

Even if I did get the Kinect to work, I doubt it would have been the miracle I was hoping for. Sonic style characters (Mobians) have different proportions than humans – most notably everything from the shoulders up. I ended up finding an embedding for making turnaround/character sheets, but it was again trained to make humans and I got inconsistent results compared to before. I did find a turnaround template for chibi characters that gave me OK-ish results running it through Canny, but Ms. T kept generating facing the wrong way.

On another session, I decided to try making Ms. T up in Sonic Forces. I installed it (ProtonDB: Platinum) and loaded my 100% save. I put Ms. T on a white background in GIMP and gave it to ControlNet. Unsurprisingly, OpenPose is not a Sonic fan. It’s trained on human data (now with animals!), but a cartoon kept returning blank outputs until I used a preprocessor called dw_openpose_full, which –while it still doesn’t like cartoon animal people– did cooperate on Ms. T’s right hand. Most every node else I dragged into place manually. I then demonstrated an ability to pose her left leg.

Character Sheet

From there, I opened OBS to record an .MP4 file. I used FFmpg to convert to .gif and loaded it in GIMP to… my computer slowed to a crawl, but it did comply without a major crash. I tried to crop and delete removed pixels… another slowdown, GIMP crashed. I adjusted OBS to record just my region of interest. 500+ frames was still a no-fly when each layer only has the changes from the last. I found options to record as .gif and record as slowly as I want. I then separated out my frames with FFmpg, making sure to have a new directory:

ffmpeg -i fileName.gif -vf fps=1 frame_%04d.jpg

I chose ten frames and arranged them in a 5×2 grid in GIMP. I then manually aligned OpenPose skeletons for each and sent that off to ControlNet. Immediately, my results improved. I got another big boost by using my grid of .gif frames, but in both cases Ms. T kept eyes and feet facing toward the viewer – even when her skeleton was pointed the other way. My next thought was to clean up the background on the grid, but compression artifacts got in the way.

Start over. I made a new character with joint visibility and background removal in mind. She looked ridiculous running through a level, but I got her knee placement by moving diagonal toward the camera and jumping. I then put eight new screenshots in a grid. Select-by-color had the background cleared in a minute. I then used Canny for silhouettes intending to reinforce OpenPose. I still got characters generating the wrong way.

Takeaway

This week has had a lot of interplay between study and play. While it’s fun to run the AI and cherry pick what comes out, the prospect of better consistently keeps me coming back to improve my “jungle gym” as I prepare to generate LoRa training images.

Final Question

The challenge that broke this topic into multiple parts is getting characters to face away from the viewer. Have you ever gotten this effect while making character sheets?

I look forward to hearing from you in the comments below or on my Socials!

A1111

ControlNet

Character Sheet

Takeaway

Final Question

Leave a Reply Cancel reply