Which Stable Diffusion UI is Right for Me?

Good Morning from my Robotics Lab! This is Shadow_8472 and today I am exploring Automatic1111 alternatives. Let’s get started!

A1111 is a nice baseline StableDiffusion interface. A determined beginner should find it approachable, it provides easy access to a large toolbox for an intermediate audience, and the community library of extensions and video/text tutorials is large enough to keep experts honing their skills.

Stable Diffusion Forge vs. StableSwarmUI

But A1111 it’s hardly the only one around. Forge has had my attention as a direct improvement for A1111, for –if nothing else– bugfixes when switching models. I’ve bumped into this limitation while experimenting with ControlNet, and it gets in the way.

But another UI (User Interface) has caught my attention recently: StableSwarmUI. From around one hour of research, it appears to be a beginner friendly package built off ComfyUI, an interface I’d previously written off as well above my skill level. Installation threw an extra challenge when it assumed browser access and I was working over SSH. I recently learned graphical SSH though:

ssh -CY <user@host>

Otherwise, StableSwarmUI was very easy to install.

Out of the box, my installation of StableSwarmUI was set up to run SDXL models. When I tried SonicDiffusion (Stable Diffusion 1.5 base) from my A1111 installation, I kept getting 50% gray outputs. I took a peek at the ComfyUI backend. Yeah… I have no business making the all-out switch until I’ve properly introduced myself to ComfyUI. Time to research until I can make a basic workflow.

OK, don’t ask me about the gray boxes. Refreshing Firefox did nothing. Some people fixed similar issues by reinstalling or deleting one file or another. I left it over a weekend, then restarted StableSwarmUI server while installing the Custom Node Manager for ComfyUI.

ComfyUI Workflows

ComfyUI all about the workflow: a program you make by linking various nodes into a flowchart. I looked up consistent character workflows to get a better idea of how they work. There are a couple options, but YouTuber NerdyRodent’s Reposer Plus caught my attention first [1]. Custom Node Manager found most of its custom nodes, but NerdyRodent used a now outdated plugin called IPAdapter. I had to study IPAdapter v2 (programmer video [2]), but it wasn’t too difficult to swap out the relevant nodes once I’d taken my time.

Reposer Plus needed additional models – some of which I already had in A1111. I made a shared models directory and moved StableSwarmUI’s entire models directory over. I found a setting in StableSwarmUI at “Server/Server Configuration/Paths/ModelRoot” to point the UI at my models directory. A1111 would have me edit a .yaml file directly, but symbolic links are easier.

I set the workflow in motion with “Queue Prompt,” but the IPAdapter Advanced node I installed threw an error on me. It took an extra session, but experimentation identified model mismatch (I tried loading a “Big G” CLIP Vision model when it needed the normal one). The workflow then ran normally, but the final upscale turned sepia. I tried a photorealistic upscale model (as opposed to one for anime), but it turned out this was another server restart issue.

Takeaway

I played around with StableSwarmUI a bit more after a line of mediocre results with the Nerdy Rodent’s workflow. Like with many tech projects, I’m interacting with a large and evolving ecosystem. Being on local hardware, I have both the liberty and burden of being my own admin while still learning the user’s point of view. And until I know both, I cannot tell if StableSwarmUI is there yet or not. I was all primed to complain about how I can’t readily draw into the beginner interface for a ControlNet input, but on closer inspection I was mistaken about how this UI works. I still haven’t found the feature, but that doesn’t mean it’s not there.

If you are a first-day beginner, I would still recommend EasyDiffusion for its easy installation, image history, and inpainting. If you want anything more, A1111 will let you explore further (Forge appears abandoned) at the cost of image history. If you want to try a cool ComfyUI workflow, StableSwarmUI may be right for you.

Final Question

What is your favorite ComfyUI workflow? I look forward to hearing your answers in the comments below or on my Socials!

Works Cited

[1] N. Rodent, “Stable Diffusion – Face + Pose + Clothing – NO training required!,” youtube.com, Oct. 14, 2023. [Online]. Available:https://youtu.be/ZcCfwTkYSz8. [Accessed June 20, 2024].

[2] L. Vision, “IPAdapter v2: all the new features!,” youtube.com, Mar. 25, 2024. [Online]. Available:https://youtu.be/_JzDcgKgghY. [Accessed June 20, 2024].

Self-Hosted AI Consistent Characters: Part 1

Good Morning from my Robotics Lab! This is Shadow_8472, and today I am on a quest to generate consistent characters using AI. Let’s get started!

It all started with wanting to learn how to make my own “consistent characters” I can summon with a keyword in my prompt. Before I can train the AI to make one, my subject needs consistent source pictures. One way to do that is to chop up a character sheet with multiple angles of the same character all generated at once. Expect all that in a future post. It sounded like a reasonable goal until I discovered just how many moving parts I needed just to approach.

In particular, my first goal is Ms. T, a Sonic OC (Original Character) by my sister, but once I figure out a successful workflow, it shouldn’t be too hard to make more.

A1111

Automatic1111 (A1111) is the go-to StableDiffusion (SD) image generation web interface for the vast majority of tutorials out there. While it’s not the easiest SD WebUI, A1111 is approachable by patient AI noobs and EasyDiffusion graduates alike. It exposes a bit many controls by default, but it packs enough power to keep a SD adept busy for a while. I also found Forge, an A1111 fork reportedly having extra features, bug fixes, grudging Linux support, and needs a virtual environment. At the top, I found ComfyUI, which lets you design and share custom workflows.

As a warm up exercise, I found a SonicDiffusion, a SD model geared for Sonic characters, generated a bunch of Ms. T portraits, and saved my favorites. Talking with my sister, I began cherry picking for details the model doesn’t control for, such as “cyclopse” designs where the eyes join at the whites vs. separate eyes (Hedgehogs are usually cyclopses, but not in the live action movies). SonicDiffusion –to my knowledge– lacks a keyword to force this distinction. Eventually, my expectations outpaced my ability to prompt, and I had to move on.

ControlNet

A major contribution to A1111’s versatility is its ecosystem of extensions. Of interest this week is ControlNet, a tool to include visual data in a StableDiffusion prompt for precision results. As of writing, I’m looking at 21 controller types – each needing a model to work. I downloaded the ones for Canny, Depth, and OpenPose to get started.

My first thought was to use an Xbox One Kinect (AKA Kinect v2) I bought from someone in my area a few Thanksgivings ago. If it works, I can easily pose for ControlNet. Long story short: I spent a couple days either last week or the week before tossing code back and forth with a self-hosted AI chatbot in SillyTavern with no dice. The open source Linux driver for the Kinect v2 just isn’t maintained for Ubuntu 22.04 and distros built on it. I couldn’t even get it to turn on its infrared LED’s (visible by my PinePhone’s camera) because of broken linkages in the header files or something. Pro tip: Don’t argue with a delusional LLM unless you can straighten it out in a reply or two. On the plus side, the AI did help me approach this job where I’d expect to have taken weeks to months without it. If/when I return, I expect to bodge it with Podman, but I may need to update the driver anyway if the kernel matters.

Even if I did get the Kinect to work, I doubt it would have been the miracle I was hoping for. Sonic style characters (Mobians) have different proportions than humans – most notably everything from the shoulders up. I ended up finding an embedding for making turnaround/character sheets, but it was again trained to make humans and I got inconsistent results compared to before. I did find a turnaround template for chibi characters that gave me OK-ish results running it through Canny, but Ms. T kept generating facing the wrong way.

On another session, I decided to try making Ms. T up in Sonic Forces. I installed it (ProtonDB: Platinum) and loaded my 100% save. I put Ms. T on a white background in GIMP and gave it to ControlNet. Unsurprisingly, OpenPose is not a Sonic fan. It’s trained on human data (now with animals!), but a cartoon kept returning blank outputs until I used a preprocessor called dw_openpose_full, which –while it still doesn’t like cartoon animal people– did cooperate on Ms. T’s right hand. Most every node else I dragged into place manually. I then demonstrated an ability to pose her left leg.

Character Sheet

From there, I opened OBS to record an .MP4 file. I used FFmpg to convert to .gif and loaded it in GIMP to… my computer slowed to a crawl, but it did comply without a major crash. I tried to crop and delete removed pixels… another slowdown, GIMP crashed. I adjusted OBS to record just my region of interest. 500+ frames was still a no-fly when each layer only has the changes from the last. I found options to record as .gif and record as slowly as I want. I then separated out my frames with FFmpg, making sure to have a new directory:

ffmpeg -i fileName.gif -vf fps=1 frame_%04d.jpg

I chose ten frames and arranged them in a 5×2 grid in GIMP. I then manually aligned OpenPose skeletons for each and sent that off to ControlNet. Immediately, my results improved. I got another big boost by using my grid of .gif frames, but in both cases Ms. T kept eyes and feet facing toward the viewer – even when her skeleton was pointed the other way. My next thought was to clean up the background on the grid, but compression artifacts got in the way.

Start over. I made a new character with joint visibility and background removal in mind. She looked ridiculous running through a level, but I got her knee placement by moving diagonal toward the camera and jumping. I then put eight new screenshots in a grid. Select-by-color had the background cleared in a minute. I then used Canny for silhouettes intending to reinforce OpenPose. I still got characters generating the wrong way.

Takeaway

This week has had a lot of interplay between study and play. While it’s fun to run the AI and cherry pick what comes out, the prospect of better consistently keeps me coming back to improve my “jungle gym” as I prepare to generate LoRa training images.

Final Question

The challenge that broke this topic into multiple parts is getting characters to face away from the viewer. Have you ever gotten this effect while making character sheets?

I look forward to hearing from you in the comments below or on my Socials!

Patriot Day, 2023

This morning of September 11 marks the 22 year anniversary of one of the greatest tragedies within living memory for the American people. In 2001, Islamic terrorists hijacked airplanes and crashed them into the Twin Towers in New York, leading to collapse of both and the loss of many lives. People around the world came out in solidarity with the US as fitting. Queen Elizabeth broke centuries-long tradition by playing the Star Spangled Banner at Buckingham palace not just on September 13, 2001, but again on the 20 year anniversary of the attacks.

I pitched the idea of a Stable Diffusion image and a short post, and my sister, Tzarina8472, suggested a firefighter saluting a US flag at half mast.

Producing this image required an update and a reboot (something I’ve put off for weeks). There were around a couple dozen “solid nope!” attempts before it as I was fighting with Automatic1111, the Stable Diffusion WebUI I am learning. The flag came out backwards and the firefighter has some metal socket where his backbone should be – not to mention his shoulders are all wrong, but I’m happy enough with how it turned out that I’m not upset my graphics card refused to customize it further without optimizations I don’t have time for because this piece invokes the emotional response I set out to invoke.

Thank you to firefighters and other first responders everywhere for your public service.

We never know what tomorrow may bring. We may wake up and it will be a whole new world.

Leo_8472 on the night of Sept. 10, 2001.

Happy Birthday Stable Diffusion!

Good Morning from my Robotics Lab! This is Shadow_8472 and today I am spending a week with Stable Diffusion to improve my skills at it. Let’s get started!

The science of AI art goes back to around the time complete CPU’s were first integrated into a single computer chip in the late 60’s/early 70’s. At least a couple waves of AI craze came and went, but on August 22, 2022, Stable Diffusion was released as free and open source software.

In the year since, Stable Diffusion has proven to be quite the disruptive technology. I’ve never had the cash to commission an online artist, but with a little effort, a decent amount of patience, and only an ounce of experience, I’ve gotten subjectively better results than commissioned works posted by low-end digital artists. I feel sorry for the people losing their dream jobs to machines, but at the same time this is a frontier I can have fun exploring.

One Week of Study

I’m setting myself a goal of spending two hours dedicated to learning Stable Diffusion every day this week. We’ll see what happens.

Monday

We won’t talk about what didn’t happen on Monday.

Tuesday

I finally started researching for this topic after midnight. I started up Easy Diffusion, an intuitive webUI for Stable Diffusion, generated a number of images with a project for my sister in mind.

I ended up looking up tips and tutorials. Looks like the hot-shot web UI these days is Automatic1111. It has more options, but is proportionally harder to use. I might try it later in the week. Otherwise, most of my time actually working today was writing the introduction.

Wednesday

Easy Diffusion is definitely the way to so if all you’re looking to do is goof around, because that is exactly what I did. So far as I can tell, I am at the exact bottom of graphics cards that can do this. I’m finding it of use to go smaller for faster feedback while learning to prompt. Conclusion: img2img has a tendency to muddle things.

Still, the draw of potentially more powerful techniques is calling. I found a piece of software called Stability Matrix, which supports a number of web UI’s – including Automatic1111, which every Stable Diffusion tutorial out there tends to go after. I ran into trouble with its integrated Python while setting it up (portable, in the end). I’m hoping I can replace it with a later version tomorrow.

Thursday

I switched approach from last night and did an online search for my error:

error while loading shared libraries: libcrypt.so.1: cannot open shared object file: No such file or directory

Multiple results pointed from people trying Python projects on Arch family systems like the one I’m on. One source from December 2022 recommended a multi-step process involving the AUR. I figured riffling through the project’s GitHub issues was worth a shot – to report it if nothing else. I searched for ‘libcrypt.so.1’, and the fix was to install libxcrypt-compat; I found it in the more trusted pacman repository [1].

AUR: Arch User Repository

I installed Automatic1111 using Stability Matrix and loaded it up. My first impression when compared to Easy Diffusion: Wall of controls. Easy is easy in both the setup AND the relatively intuitive control scheme, but it seemingly doesn’t support a lot of the tools I’ve seen and want to learn.

Per tradition, I made a photo of an astronaut riding a horse. It was a flop, but I got an image nonetheless. Its immediate followup didn’t finish when I told it to fix faces and I ran out of vRAM memory on my graphics card (to be fair, I didn’t have next to everything closed).

Sabbath starts tomorrow, and I’ve been writing these mostly late at night. I can tell I’m not likely to meet my time goal of a couple hours every day, but I feel getting to this step is a major accomplishment. Word count says 700+ words, so I could end it here and feel fine about it. I’ll see what happens. I want to find the control that tells it my graphics card is barely up to this stuff.

Friday

Time to start optimizing! For cotext, I’m on an NVIDIA graphics card with 4GB of vRAM, which is enough to get a feel for the software if you have a minute or two of patients per image, but having more would be prudent. After trying a couple online videos, I found AUTOMATIC1111’s GitHub had a list of optimizations [2] I’ll be listing as –flags to the COMMANDLINE_ARGS variable in my start script. I don’t have time this evening for a full test, but perhaps tomorrow night or Sunday I can do some benchmarking.

vRAM: Video RAM (Random Access Memory) *So glad to have finally looked this one up!*

xformers

For NVIDIA cards, we have a library xformers. It speeds up image generation and lowers vRAM usage, but at the cost of consistency, which may not be a bad thing depending on the situation.

opt-split-attention/opt-sub-quad-attention/opt-split-attention-v1

A “black magic” optimization that should be automatically handled. I’ll be selecting one via the webUI, though.

medvram/lowvram

This optimization breaks up the model to accommodate lesser graphics cards. The smaller the pieces though, the more time it will need to swap pieces out. Side note, but I believe it’s MEDvram as in MEDium as opposed to the naive pronunciation I heard with MEDvram as in MEDical.

opt-channelslast

Some procedures are exparimental optimization that is literally unknown if it’s worth it at this time. I’m skipping it.

Saturday Night

I took it off

Sunday

I joined my father on shopping trip and we ran out of gas at a car wash. By the time I sat down to work on Stable Diffusion, I wasn’t up to much more than an unguided self-tour of the settings. I don’t know what most of the categories are supposed to do! I’ll look each one up in time.

Monday Morning

As usual in recent months, I spend a while writing the Takeaway and Final Question, dressing up the citations, and copying everything out of LibreOffice and into WordPress for publication at noon.

Takeaway

Progress! It might not be what I expected this week, but I’m still satisfied that I have something to show off. The point I’m at is to get to the same place as I was with Easy Diffusion before looking up the toys I came to Automatic1111 for.

As one final note, this week is also the anniversary of this blog. It caused a bit of a delay in getting this post scheduled by noon, but that would make it the third instance I can remember of a late post in twice as many years. I feel bad about it, but at the same time, it’s still a decent track record.

Final Question

Do you have a favorite interface for using Stable Diffusion?

[1] PresTrembleyIIIEsq, et. all, “SD.Next / ComfyUI Install: Unexpected Error #54,” github.com, July 30, 2023. [Online]. https://github.com/LykosAI/StabilityMatrix/issues/54. [Accessed Aug. 8, 2023].

[2] AUTOMATIC1111, “Optimizations” github.com, August, 2023. [Online]. https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Optimizations. [Accessed Aug. 8, 2023].