Which Stable Diffusion UI is Right for Me?

Good Morning from my Robotics Lab! This is Shadow_8472 and today I am exploring Automatic1111 alternatives. Let’s get started!

A1111 is a nice baseline StableDiffusion interface. A determined beginner should find it approachable, it provides easy access to a large toolbox for an intermediate audience, and the community library of extensions and video/text tutorials is large enough to keep experts honing their skills.

Stable Diffusion Forge vs. StableSwarmUI

But A1111 it’s hardly the only one around. Forge has had my attention as a direct improvement for A1111, for –if nothing else– bugfixes when switching models. I’ve bumped into this limitation while experimenting with ControlNet, and it gets in the way.

But another UI (User Interface) has caught my attention recently: StableSwarmUI. From around one hour of research, it appears to be a beginner friendly package built off ComfyUI, an interface I’d previously written off as well above my skill level. Installation threw an extra challenge when it assumed browser access and I was working over SSH. I recently learned graphical SSH though:

ssh -CY <user@host>

Otherwise, StableSwarmUI was very easy to install.

Out of the box, my installation of StableSwarmUI was set up to run SDXL models. When I tried SonicDiffusion (Stable Diffusion 1.5 base) from my A1111 installation, I kept getting 50% gray outputs. I took a peek at the ComfyUI backend. Yeah… I have no business making the all-out switch until I’ve properly introduced myself to ComfyUI. Time to research until I can make a basic workflow.

OK, don’t ask me about the gray boxes. Refreshing Firefox did nothing. Some people fixed similar issues by reinstalling or deleting one file or another. I left it over a weekend, then restarted StableSwarmUI server while installing the Custom Node Manager for ComfyUI.

ComfyUI Workflows

ComfyUI all about the workflow: a program you make by linking various nodes into a flowchart. I looked up consistent character workflows to get a better idea of how they work. There are a couple options, but YouTuber NerdyRodent’s Reposer Plus caught my attention first [1]. Custom Node Manager found most of its custom nodes, but NerdyRodent used a now outdated plugin called IPAdapter. I had to study IPAdapter v2 (programmer video [2]), but it wasn’t too difficult to swap out the relevant nodes once I’d taken my time.

Reposer Plus needed additional models – some of which I already had in A1111. I made a shared models directory and moved StableSwarmUI’s entire models directory over. I found a setting in StableSwarmUI at “Server/Server Configuration/Paths/ModelRoot” to point the UI at my models directory. A1111 would have me edit a .yaml file directly, but symbolic links are easier.

I set the workflow in motion with “Queue Prompt,” but the IPAdapter Advanced node I installed threw an error on me. It took an extra session, but experimentation identified model mismatch (I tried loading a “Big G” CLIP Vision model when it needed the normal one). The workflow then ran normally, but the final upscale turned sepia. I tried a photorealistic upscale model (as opposed to one for anime), but it turned out this was another server restart issue.

Takeaway

I played around with StableSwarmUI a bit more after a line of mediocre results with the Nerdy Rodent’s workflow. Like with many tech projects, I’m interacting with a large and evolving ecosystem. Being on local hardware, I have both the liberty and burden of being my own admin while still learning the user’s point of view. And until I know both, I cannot tell if StableSwarmUI is there yet or not. I was all primed to complain about how I can’t readily draw into the beginner interface for a ControlNet input, but on closer inspection I was mistaken about how this UI works. I still haven’t found the feature, but that doesn’t mean it’s not there.

If you are a first-day beginner, I would still recommend EasyDiffusion for its easy installation, image history, and inpainting. If you want anything more, A1111 will let you explore further (Forge appears abandoned) at the cost of image history. If you want to try a cool ComfyUI workflow, StableSwarmUI may be right for you.

Final Question

What is your favorite ComfyUI workflow? I look forward to hearing your answers in the comments below or on my Socials!

Works Cited

[1] N. Rodent, “Stable Diffusion – Face + Pose + Clothing – NO training required!,” youtube.com, Oct. 14, 2023. [Online]. Available:https://youtu.be/ZcCfwTkYSz8. [Accessed June 20, 2024].

[2] L. Vision, “IPAdapter v2: all the new features!,” youtube.com, Mar. 25, 2024. [Online]. Available:https://youtu.be/_JzDcgKgghY. [Accessed June 20, 2024].

Linux Phone Milestone: Moving In

Good Morning from my Robotics Lab! This is Shadow_8472 and today I am moving into my PinePhone. Let’s get started!

Previously

After around four years of ownership, my PinePhone UBports Edition has PostmarketOS/Phosh and is working on a network. Now that the big solar storm is over, it actually gets signal.

Password Adjustment

Before I properly move into my phone, I have a couple more topics to explore first. One: I need a relatively short user password to unlock from sleep. At the same time, I also want to require a longer password for admin functions. Two: I set up Full Disk Encryption (FDE) while installing PostmarketOS, with testing in mind. I need something a bit less guessable.

The root password can be required in a number of ways from a special admin account to not having sudo. I tried the later, and dependencies insisted it stay. Online search results were frequently more into removing admin privileges entirely, but I did pick up on the history and intended context of sudo as being programmed to easily revoke root access when no longer needed – as well as log commands used so IT knows who did what and hopefully how to fix it. One detail of note was the wheel group (as in a car’s steering wheel). Early Unix required wheel to su (Substitute User) into root, but it’s not a universal standard.

Or I can configure sudo by editing /etc/sudoers. Using visudo is recommended to check syntax, but it dropped me into vi/vim, which I’m having none of. I installed nano, then tried/failed to set it as system default text editor. Otherwise, I might have tried a configuration where sudo just asks for the root password. Instead, I commented out the line giving users with the wheel group sudo privileges. (NOTE: While finalizing this post, I found this may break app stores. Next post about PinePhone, I will try requiring the root password instead.)

With sudo hobbled, I learned more about FDE. As it just so happens, PostmarketOS was built for it. I got the name LUKS (Linux Unified Key Setup) and a hint that the boot partition might need to be left unencrypted by the time a community member on the Pine64 community gave me the exact command to change my key:

$ sudo cryptsetup luksChangeKey /dev/mmcblk2p2 -S 0

In my case, I double checked the partition name before performing it with su instead of sudo.

Update on August 8, 2024: I disabled SSH password login, requiring key login instead.

Moving In

Confident I had secured it to my skill level, I did some more “normal” new phone/computer behaviors, such as finding the dark theme, using AM/PM, and adjusting automatic sleep times. My background image had to be moved over with scp (Secure CoPy) and it took a reboot before it showed up. I also moved my ringtone and notification sound over from my previous phone in the same way, installed them, and rebooted again. While it would have made things a little easier, I’m forgoing on NFS access until my homelab servers are moved away from common LAN subnet addresses.

Moving my old contacts list wasn’t too hard either, though I did get help from an automaton app on the Android side to export to .vcf format (Variant Call File). I then used an SD card to move them over. Phosh’s default contacts app by The GNOME Project accepted it no problem. The longest part was weeding out stale contacts going back to high school.

The luck ended with WayDroid. One goal for this phone is demoing Android apps in on Linux. Waydroid looks like the best option. I installed it no problem, but initialization took a few attempts when the 2 minute sleep kept corrupting a large download. From there, I tried installing an F-Droid client, but I got an error in-terminal about the WayDroid session stopping.

Takeaway

My Linux phone is not completely stable by any stretch of the imagination. It keeps crashing, the battery feels like a joke – and overall, the thing feels slow no matter what I do. But remember that I have an early prototype aimed at developers and enthusiasts. Pine64 has production models released, and they aren’t the only ones making phones to run Linux. I am just thankful they didn’t make the screen so big they had to mutilate it to accommodate the camera.

Final Question

Blunder! I just noticed the local app store hangs when trying to install… maybe? Probably because of my sudo configuration, but I will need more time with it as I build up a to-do list for another follow-up post. What all should that to-do list contain?

Self-Hosted AI Consistent Characters: Part 2

Good Morning from my Robotics Lab! This is Shadow_8472 and today I am continuing work towards a consistent character using Stable Diffusion AI image generation software. Let’s get started!

Previously

Last time I talked about making a consistent character on local hardware, I went over using Automatic 1111 (A1111) web interface (running on my father’s computer), installing the ControlNet extension for Stable Diffusion and equipping it with models for OpenPose, and then using OpenPose to generate eight skeletons based of screenshots from Sonic Forces. All common enough stuff, but for context, I am following a tutorial on YouTube by Not4Talent “Create consistent characters with Stable diffusion!!” [1].

Character Switch

While I had previously been working on a Sonic fan character for my sister whom I am calling Ms. T, I switched over to working on my own character in the same setting, whom I’m calling Smokey Fox. He’s just spent several years studying in a foreign culture with a human sense of modesty, so I generated an orange Mobian fox with blue eyes and wearing red sneakers, blue jeans, and a red trench coat while applying a bunch of little things I picked up along the way, such as quality prompts and negative prompts.

Along the way, the AI came up with details I liked, such as a white shirt, black gloves, black tipped ears, and some of the time, he even generated with a red thing on a glove I decided was some kind of accessory crystal. Quality was spotty. It took me a few attempts before I cleaned up the hair in his profile shots by prompting for a bald head. Only four poses consistently gave him a tail, but one was almost never usable. It also tried giving him a black tail tip a few times, but I didn’t like that.

Along the way, I grabbed pictures with poses I liked and stacked them in GIMP. Because I was using a fixed seed, I was able to assemble more poses until I had eight portraits I’d touched up. Notably: I had to extend his coat on his behind shot, his shoes needed a lot of help, and I had to draw one of his tails from scratch. The crystal thing on his glove also got interesting to transfer around, and I did have to draw it myself a few times. During this process, I took screenshots of my work in progress and shared them on Discord.

No Auto Save

Disaster!! At some point, my computer randomly crashed. I don’t remember the details, but it was several days later when I returned to work on Smokey that I learned that GIMP doesn’t auto-save, like LibreOffice just as I mentioned it. Thankfully, I had the screenshots to work with. I also lost my original prompts through a side project where I helped my mother troll a friend from elementary regarding a Noah’s Ark baby quilt she made for her. In total, I made an island chain, a forest scene, and just as it was about to arrive, I made up a beach scene with the Taco Bell logo embedded using a ControlNet model for making fancy QR codes.

Back to Smokey Fox, the next step in the tutorial was upscaling. Pain followed. The Not4Talent tutorial [1] didn’t make sense to me, so I spent a day or two unenthusiastically bumbling around trying to learn enough to feel ready to post. I played around with several ControlNet models. Most are variations on making white-on-black detail maps. One late night session landed me an upscale tutorial by Olivio Sarikas [2] that clicked with me. Similarly to other tutorials, A1111 has [re?]moved stuff around between updates in the 6 months to a year since it was popular to introduce ControlNet – not to mention various plugins which may differ between our setups. Olivio’s tutorial rescued my project, and I got back to having fun cleaning up details with GIMP.

Takeaway

I may need to take a closer look at Forge instead of A1111. A1111 has a known bug where it has trouble unloading models, but while I was playing with various ControlNet models, I managed to defeat the vRAM capacity on the GPU.

Final Question

Forge will require a virtual environment, which I don’t know how to do properly yet. What tutorial would you recommend? I look forward to hearing your answers in the comments below or on my Socials!

Work Cited

[1] Not4Talent, “Create consistent characters with Stable diffusion!!,”youtube.com, Jun. 2, 2023. [Online]. Available:https://youtu.be/aBiGYIwoN_k [Accessed Jun. 7, 2024].

[2] O. Sarikas, “ULTIMATE Upscale for SLOW GPUs – Fast Workflow, High Quality, A1111.”youtube.com, May 6, 2023 [Online]. Available:https://youtu.be/3z4MKUqFEUk. [Accessed Jun. 7, 2024].

Generative AI: Ethics on the Frontier

Good Morning from my Robotics Lab! This is Shadow_8472 and today I have a few thoughts about ethics when living on a frontier. Let’s get started.

Law and New Technology

Law follows innovation. A world without motor vehicles or electricity won’t require cars to stop at a red light. Conversely, new technologies bring legal uncertainty. A nuclear-powered laptop might be ready for 20 years of abuse in an elementary classroom without leaking any radiation, but expect more courtroom pushback than a mind-reading camera – at least until the legal system can parse the respective technologies.

Generative AI in 2024 is data hungry. More training data makes for a better illusion of understanding. OpenAI’s ChatGPT-4o reportedly can read a handwritten note and display emotion in a verbal reply in real time. If they haven’t already, they will soon have a model trained off every scrap of text, video, and audio freely available as well as whatever databases they have access to. But the legal-moral question is: what is fair game?

Take drones as a recent, but more mature point of comparison. Generally speaking, drones should be welcome whereever recreational R/C aircraft already are. Hover like you might be spying on someone expecting privacy, and there might be trouble. Laws defining the boundaries between these and similar behaviors protect drone enthusiasts and homeowners alike. Before that compromise was solidified, the best anyone could do was not be a jerk while flying/complaining.

The AI Art War

But not everyone’s idea of jerk behavior is the same. Many AI trainers echo the refrain, “It’s not illegal, so we cam scrape.” Then digital artists on rough times see AI duplicating their individualized styles, and they fight back. Soon, jerks are being jerks to jerks because they’re both jerks.

Model trainers practically need automated scraping, precluding an opt-in consent model like what artists want. Trainers trying not to be jerks can respect name blacklists, but improperly tagged re-uploads sneak in anyway. Artists can use tools like Glaze and Nightshade to poison training sets, but it’s just a game of cat and mouse so far.

Those were the facts as stated as objectively as I can. My thoughts are that artists damage their future livelihood more by excluding their work from training data. The whole art market will be affected as they lose commissioners to a machine that does “good enough.” Regulars who care about authentic pieces will be unaffected. Somewhere between these two groups are would-be art forgers in their favorite style and people using AI to shop for authentic commissions. I expect the later to be larger, so the moral decision is to make an inclusive model.

At the same time, some countries have a right to be forgotten. Verbally abusing AI art jerks provides digital artists with a much-needed sense of control. While artists’ livelihoods are threatened on many sides, AI is approachable enough to attack, so they vent where they can. I believe most of the outcry is overreaction but remember I’m biased in favor of Team Technology, though I am not wholly unsympathetic to their cause. I am in favor of letting them exclude themselves, just not for the reasons they would rather hear.

Takeaway

I see the AI situation in 2024 as comparable to China’s near monopoly on consumer electronics and open secret about committing human rights violations. In theory you could avoid unethically sourced consumer goods, but often times going without is not an option. You can then see the situation as forcing you to support immoral practices or you can see yourself as making the effort to find the best –though only– reasonable option available. The same thing applies to AI. All other factors equal, I intend to continue using AI tools as my conscience allows.

Final Question

Do you disagree with my stance? Feel free to let me know in the comments below or on my Socials!