KDE Wayland: My Second Impressions

Good Morning from my Robotics Lab! This is Shadow_8472 with a side project for the week. Let’s get started!

Riding the bleeding edge with EndeavourOS, my upgrade from KDE 5 to 6 when the default implementation was moved from Xorg to Wayland. The deal breaker for me was Discord not displaying messages properly as I type. I retreated to KDE 6 Xorg.

That was almost a couple months ago. Xorg has worked since then, but it was a little love starved having to share developer time with KDE Wayland. But I understand; there are only so many developers, and their main focus is getting Wayland working. On an update/reboot cycle, I decided to check out their progress. I have to say it’s not bad. The bug from earlier is resolved, at least.

Not all my settings transferred over. In particular, my mouse pointer got stuck when crossing a boundary on my multi-screen display. This setting was found in System Settings/Mouse & Touchpad/Screen Edges/Edge barrier. I set it to 0 pixels, disabling the feature.


OBS (Open Broadcast Software) had trouble seeing the screen due to windows in Wayland not being able to see outside themselves. It now needs some kind of screen caster that shows up in my system tray.

The obvious difference I can’t help is spotty performance from my taskbar window previews. Oftentimes, they don’t even load when I mouseover them. Other than that, there are new interface sounds I’ve already been living with, but haven’t found the way to turn them off.

Takeaway

Technology marches on. Legacy software becomes obsolete. There are bumps along the way, but the KDE Wayland’s are smoothed enough that I can jump over now. In another couple months, I expect the experience will be virtually unnoticeable.

Final Question

Have you ever rode the bleeding edge and had to come back later? I look forward to hearing from you in the comments below or on my Socials!

I’m Sold On StableSwarmUI

Good Morning from my Robotics Lab! This is Shadow_8472 and I’ve made up my mind on StableSwarmUI as a replacement to A1111. Let’s get started!

Generative AI (Artificial Intelligence) is the technology buzz word of the decade so far thanks to open sourced models. Automatic1111 has an extensive community library, but ComfyUI’s flexibility may yet challenge it as the next favorite. While not yet polished to A1111’s visual aesthetic, a total AI noob should find StableSwarmUI navigable while letting him/her peek at Comfy beneath.

Learning ComfyUI Basics

I’m taking that peek… ComfyUI looks like boxes and spaghetti. The correct term is “workflow.” Each node represents some unit of work similar to any other UI. The power of Comfy is the ability to arbitrarily link and re-arrange nodes. Once my first impression –intimidation– wore off, I found grouping familiar options by node and color coding their connections made the basic workflow more intuitive while highlighting my gaps in understanding of the Stable Diffusion process.

Let’s define some terms before continuing. Be warned: I’m still working on my intuition, so don’t quote me on this.

  • Latent Space: data structure for concepts trained by [counter]examples. Related concepts are stored close to each other for interpolation between them.
  • Latent Image: a graphical point in a latent space.
  • Model/Checkpoint: save files for a latent space. From what I can tell: checkpoints can be trained further, but finished models are more flexible.
  • CLIP: (Contrastive Language-Image Pretraining) a part of the model that turns text into concepts.
  • Sampler: explores the model’s latent space for a given number of “steps” with respect to concepts specified in the CLIP conditioning as well as additional sliders.
  • VAE: (Variable AutoEncoder) a model that translates images to and from latent space.

The basic Stable Diffusion workflow starts with an empty Latent Image node defining height, width, and batch size. Alongside this, a model or checkpoint is loaded. CLIP Text Encode nodes are used to enter prompts (typically both positive and negative). A KSampler node does the heavy lifting, combining everything into a low-resolution preview based off the latent image (if enabled). Finally, a VAE decoder node turns your latent image into a normal picture.

While I’m still developing an intuition for how a latent space works, I’m imagining a tent held up by a number of poles defining its shape. You are free to interpolate between these points, but quirks can arise when concepts bleed into each other: like how you’d tend to imagine bald people as male.

ControlNet

The next process I wish to demystify to myself is ControlNet. A second model is loaded to extract information from an existing image. This information is then applied to your positive prompt. (Let me know if you get any interesting results conditioning negative prompts.) Add in a second or more ControlNets, and combining them presents its own artistic opportunity.

For this exercise, I used a picture I made during my first attempt at Stable Diffusion: a buff angel with a glowing sword. As a challenge to myself, I redid it with SDXL (Stable Diffusion eXtra Large). I used matching ControlNet models for Canny and OpenPose. Some attempts came up with details I liked and tried to keep. I added the SDXL refiner model to try fix his sword hand. It didn’t work, but in the end, I had made a generation I liked with a few golden armor pieces and a red, white, and blue “(kilt:1.4).” Happy 4th of July!

Practical Application

A recent event has inspired me to try making a landscape picture with a pair of mason jars –one full of gold coins, and the other empty– both on a wooden table in front of a recognizable background. It’s a bit complex to generate straight out of text, but it shouldn’t be too hard with regional conditioning, right?

Impossible. Even if my background came out true, I’d still want the mason jars to match, which didn’t happen. This would have been end of the line of the if I were limiting myself to A1111 without researching additional plugins for my already confusing-to-manage cocktail. With Comfy, My basic idea is to generate a jar and generate another filled jar based off it, then generate them together in front of my background.

Again: easier said than done. Generating the initial mason jar was simple. I even arranged it into a tidy group. From there, I made a node group for ControlNet Canny and learned about Latent Composite – both of which allowed me to consistently put the same jar into a scene twice (once I figured out my dimensions and offsets), but filling/emptying one jar’s gold proved tricky. “Filling” it only ever gave me a quarter jar of coins (limited by the table visible through the glass), and emptying it left the glass surface horribly deformed. What’s more is that the coins I did get would often morph into something else –such as maple syrup– with too high of a denoise in the KSampler. On the other hand, too low a value, and the halves of the image don’t fuse. I even had coins wind up in the wrong jar with an otherwise clean workflow.

Even though I got a head start on this project, I must lay it down here, incomplete. I have seen backgrounds removed properly with masking, so I’ll be exploring that when I come back.

Takeaway

ComfyUI looks scary, but a clean workflow is its own work of art. Comfy’s path to mastery is clearer than A1111. Even if you stick to basics, StableSwarmUI has simpler interfaces – a simple prompt and an “unpolished A1111-esk” front panel for loaded pre-made workflows.

Final Question

I’m probably being too hard on myself compositing glass in-workflow. Let me know what you think. What tips and tricks might you know for advanced AI composition? I look forward to hearing from you in the comments below or on my Socials!

Can Linux See a GameCube Controller?

Good Morning from my Robotics Lab! This is Shadow_8472 with a side project for the week. Let’s get started!

I’ve had a Nintendo GameCube controller on my desk ever since I got a USB adapter some months ago. I poke at it every so often, trying to confirm it working in Linux, but in late June of this year, things came together.

Computer: Derpy Chips
Distribution: PopOS 22.04
Desktop Environment: KDE Plasma 5.24.7 (Qt 5.15.3)
Product: DragonRise Inc. Controller Adapter

In my research, I read about this product working with Dolphin Emulator on Linux, if not elsewhere. Dolphin sounded like a good first stop, and one day I sat down with enough patience to compile it. It needed a few tries before I read a guide on Dolphin’s GitHub explaining how the project has dropped qt5 support [1]. However, my qt version can use backports.

I installed my compiled Dolphin package. Now for a ROM. Commercial games are illegal to download, but I can either dump my own games (not in my skill set yet) or find a homebrew game. GameCube only has one such title I found worth mentioning: Toy Wars. It’s not even an exclusive – probably because it’s basically a baby Wii/Wii U on the inside.

Long story short: Toy Wars gave me a black screen. I happen to know the Wii has tons of Homebrew, so I found another guide [2] that walked me through performing a system update, netting me the Wii menu, the Homebrew channel, and then a content browser layered on top of that. While significantly hampered navigating this browser using the emulated Wii remote, I found and downloaded a free homebrew game about dogging space junk.

And still nothing from the GameCube controller. It showed up with the command “lsusb,” but Dolphin’s configuration options said it didn’t have permission. There was the missing link. By default, Linux is a lot more locked down to strange USB peripherals than Windows. I had to make a file under “/etc/udev/rules.d” describing my controller adapter and granting these missing permissions.

$ cat /etc/udev/rules.d/51-gcadapter.rules
SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="057e", ATTRS{idProduct}=="0337", MODE="0666"

The 51 in the name has to do with what order this and similar rules overwrite each other. There are a ton of possible parameters for the file contents, but idVender and idProduct can be found with the “lsusb” command where it says ID vvvv:pppp. Mode is the same as file permissions ([user, group, everybody]x[read*4+write*2+execute*1]).

Takeaway

From further observation, I concluded these changes let Dolphin reach out to find the state of my controller[s]; no events are triggered in Xorg, as happen for the mouse and keyboard. Long term, I have a gag goal of writing a custom driver so I can use my GameCube controller however I like, but I didn’t get that this go-around. Oh well.

Final Question

I couldn’t find out what the leading 0 is supposed to represent. If you know, I look forward to hearing from you in the comments below or on my Socials!

Works Cited

[1] Dolphin Emulator, “Building for Linux,” github.com, May 31, 2024. [Online]. Available: https://github.com/dolphin-emu/dolphin/wiki/Building-for-Linux. [Accessed: June 25, 2024].

[2] Nintendo Homebrew, “Installing Homebrew Channel on Dolphin Emulator,” 2024. [Online] Available: https://wii.hacks.guide/homebrew-dolphin.html. [Accessed: June 25, 2024].

Which Stable Diffusion UI is Right for Me?

Good Morning from my Robotics Lab! This is Shadow_8472 and today I am exploring Automatic1111 alternatives. Let’s get started!

A1111 is a nice baseline StableDiffusion interface. A determined beginner should find it approachable, it provides easy access to a large toolbox for an intermediate audience, and the community library of extensions and video/text tutorials is large enough to keep experts honing their skills.

Stable Diffusion Forge vs. StableSwarmUI

But A1111 it’s hardly the only one around. Forge has had my attention as a direct improvement for A1111, for –if nothing else– bugfixes when switching models. I’ve bumped into this limitation while experimenting with ControlNet, and it gets in the way.

But another UI (User Interface) has caught my attention recently: StableSwarmUI. From around one hour of research, it appears to be a beginner friendly package built off ComfyUI, an interface I’d previously written off as well above my skill level. Installation threw an extra challenge when it assumed browser access and I was working over SSH. I recently learned graphical SSH though:

ssh -CY <user@host>

Otherwise, StableSwarmUI was very easy to install.

Out of the box, my installation of StableSwarmUI was set up to run SDXL models. When I tried SonicDiffusion (Stable Diffusion 1.5 base) from my A1111 installation, I kept getting 50% gray outputs. I took a peek at the ComfyUI backend. Yeah… I have no business making the all-out switch until I’ve properly introduced myself to ComfyUI. Time to research until I can make a basic workflow.

OK, don’t ask me about the gray boxes. Refreshing Firefox did nothing. Some people fixed similar issues by reinstalling or deleting one file or another. I left it over a weekend, then restarted StableSwarmUI server while installing the Custom Node Manager for ComfyUI.

ComfyUI Workflows

ComfyUI all about the workflow: a program you make by linking various nodes into a flowchart. I looked up consistent character workflows to get a better idea of how they work. There are a couple options, but YouTuber NerdyRodent’s Reposer Plus caught my attention first [1]. Custom Node Manager found most of its custom nodes, but NerdyRodent used a now outdated plugin called IPAdapter. I had to study IPAdapter v2 (programmer video [2]), but it wasn’t too difficult to swap out the relevant nodes once I’d taken my time.

Reposer Plus needed additional models – some of which I already had in A1111. I made a shared models directory and moved StableSwarmUI’s entire models directory over. I found a setting in StableSwarmUI at “Server/Server Configuration/Paths/ModelRoot” to point the UI at my models directory. A1111 would have me edit a .yaml file directly, but symbolic links are easier.

I set the workflow in motion with “Queue Prompt,” but the IPAdapter Advanced node I installed threw an error on me. It took an extra session, but experimentation identified model mismatch (I tried loading a “Big G” CLIP Vision model when it needed the normal one). The workflow then ran normally, but the final upscale turned sepia. I tried a photorealistic upscale model (as opposed to one for anime), but it turned out this was another server restart issue.

Takeaway

I played around with StableSwarmUI a bit more after a line of mediocre results with the Nerdy Rodent’s workflow. Like with many tech projects, I’m interacting with a large and evolving ecosystem. Being on local hardware, I have both the liberty and burden of being my own admin while still learning the user’s point of view. And until I know both, I cannot tell if StableSwarmUI is there yet or not. I was all primed to complain about how I can’t readily draw into the beginner interface for a ControlNet input, but on closer inspection I was mistaken about how this UI works. I still haven’t found the feature, but that doesn’t mean it’s not there.

If you are a first-day beginner, I would still recommend EasyDiffusion for its easy installation, image history, and inpainting. If you want anything more, A1111 will let you explore further (Forge appears abandoned) at the cost of image history. If you want to try a cool ComfyUI workflow, StableSwarmUI may be right for you.

Final Question

What is your favorite ComfyUI workflow? I look forward to hearing your answers in the comments below or on my Socials!

Works Cited

[1] N. Rodent, “Stable Diffusion – Face + Pose + Clothing – NO training required!,” youtube.com, Oct. 14, 2023. [Online]. Available:https://youtu.be/ZcCfwTkYSz8. [Accessed June 20, 2024].

[2] L. Vision, “IPAdapter v2: all the new features!,” youtube.com, Mar. 25, 2024. [Online]. Available:https://youtu.be/_JzDcgKgghY. [Accessed June 20, 2024].

Linux Phone Milestone: Moving In

Good Morning from my Robotics Lab! This is Shadow_8472 and today I am moving into my PinePhone. Let’s get started!

Previously

After around four years of ownership, my PinePhone UBports Edition has PostmarketOS/Phosh and is working on a network. Now that the big solar storm is over, it actually gets signal.

Password Adjustment

Before I properly move into my phone, I have a couple more topics to explore first. One: I need a relatively short user password to unlock from sleep. At the same time, I also want to require a longer password for admin functions. Two: I set up Full Disk Encryption (FDE) while installing PostmarketOS, with testing in mind. I need something a bit less guessable.

The root password can be required in a number of ways from a special admin account to not having sudo. I tried the later, and dependencies insisted it stay. Online search results were frequently more into removing admin privileges entirely, but I did pick up on the history and intended context of sudo as being programmed to easily revoke root access when no longer needed – as well as log commands used so IT knows who did what and hopefully how to fix it. One detail of note was the wheel group (as in a car’s steering wheel). Early Unix required wheel to su (Substitute User) into root, but it’s not a universal standard.

Or I can configure sudo by editing /etc/sudoers. Using visudo is recommended to check syntax, but it dropped me into vi/vim, which I’m having none of. I installed nano, then tried/failed to set it as system default text editor. Otherwise, I might have tried a configuration where sudo just asks for the root password. Instead, I commented out the line giving users with the wheel group sudo privileges. (NOTE: While finalizing this post, I found this may break app stores. Next post about PinePhone, I will try requiring the root password instead.)

With sudo hobbled, I learned more about FDE. As it just so happens, PostmarketOS was built for it. I got the name LUKS (Linux Unified Key Setup) and a hint that the boot partition might need to be left unencrypted by the time a community member on the Pine64 community gave me the exact command to change my key:

$ sudo cryptsetup luksChangeKey /dev/mmcblk2p2 -S 0

In my case, I double checked the partition name before performing it with su instead of sudo.

Update on August 8, 2024: I disabled SSH password login, requiring key login instead.

Moving In

Confident I had secured it to my skill level, I did some more “normal” new phone/computer behaviors, such as finding the dark theme, using AM/PM, and adjusting automatic sleep times. My background image had to be moved over with scp (Secure CoPy) and it took a reboot before it showed up. I also moved my ringtone and notification sound over from my previous phone in the same way, installed them, and rebooted again. While it would have made things a little easier, I’m forgoing on NFS access until my homelab servers are moved away from common LAN subnet addresses.

Moving my old contacts list wasn’t too hard either, though I did get help from an automaton app on the Android side to export to .vcf format (Variant Call File). I then used an SD card to move them over. Phosh’s default contacts app by The GNOME Project accepted it no problem. The longest part was weeding out stale contacts going back to high school.

The luck ended with WayDroid. One goal for this phone is demoing Android apps in on Linux. Waydroid looks like the best option. I installed it no problem, but initialization took a few attempts when the 2 minute sleep kept corrupting a large download. From there, I tried installing an F-Droid client, but I got an error in-terminal about the WayDroid session stopping.

Takeaway

My Linux phone is not completely stable by any stretch of the imagination. It keeps crashing, the battery feels like a joke – and overall, the thing feels slow no matter what I do. But remember that I have an early prototype aimed at developers and enthusiasts. Pine64 has production models released, and they aren’t the only ones making phones to run Linux. I am just thankful they didn’t make the screen so big they had to mutilate it to accommodate the camera.

Final Question

Blunder! I just noticed the local app store hangs when trying to install… maybe? Probably because of my sudo configuration, but I will need more time with it as I build up a to-do list for another follow-up post. What all should that to-do list contain?

Self-Hosted AI Consistent Characters: Part 2

Good Morning from my Robotics Lab! This is Shadow_8472 and today I am continuing work towards a consistent character using Stable Diffusion AI image generation software. Let’s get started!

Previously

Last time I talked about making a consistent character on local hardware, I went over using Automatic 1111 (A1111) web interface (running on my father’s computer), installing the ControlNet extension for Stable Diffusion and equipping it with models for OpenPose, and then using OpenPose to generate eight skeletons based of screenshots from Sonic Forces. All common enough stuff, but for context, I am following a tutorial on YouTube by Not4Talent “Create consistent characters with Stable diffusion!!” [1].

Character Switch

While I had previously been working on a Sonic fan character for my sister whom I am calling Ms. T, I switched over to working on my own character in the same setting, whom I’m calling Smokey Fox. He’s just spent several years studying in a foreign culture with a human sense of modesty, so I generated an orange Mobian fox with blue eyes and wearing red sneakers, blue jeans, and a red trench coat while applying a bunch of little things I picked up along the way, such as quality prompts and negative prompts.

Along the way, the AI came up with details I liked, such as a white shirt, black gloves, black tipped ears, and some of the time, he even generated with a red thing on a glove I decided was some kind of accessory crystal. Quality was spotty. It took me a few attempts before I cleaned up the hair in his profile shots by prompting for a bald head. Only four poses consistently gave him a tail, but one was almost never usable. It also tried giving him a black tail tip a few times, but I didn’t like that.

Along the way, I grabbed pictures with poses I liked and stacked them in GIMP. Because I was using a fixed seed, I was able to assemble more poses until I had eight portraits I’d touched up. Notably: I had to extend his coat on his behind shot, his shoes needed a lot of help, and I had to draw one of his tails from scratch. The crystal thing on his glove also got interesting to transfer around, and I did have to draw it myself a few times. During this process, I took screenshots of my work in progress and shared them on Discord.

No Auto Save

Disaster!! At some point, my computer randomly crashed. I don’t remember the details, but it was several days later when I returned to work on Smokey that I learned that GIMP doesn’t auto-save, like LibreOffice just as I mentioned it. Thankfully, I had the screenshots to work with. I also lost my original prompts through a side project where I helped my mother troll a friend from elementary regarding a Noah’s Ark baby quilt she made for her. In total, I made an island chain, a forest scene, and just as it was about to arrive, I made up a beach scene with the Taco Bell logo embedded using a ControlNet model for making fancy QR codes.

Back to Smokey Fox, the next step in the tutorial was upscaling. Pain followed. The Not4Talent tutorial [1] didn’t make sense to me, so I spent a day or two unenthusiastically bumbling around trying to learn enough to feel ready to post. I played around with several ControlNet models. Most are variations on making white-on-black detail maps. One late night session landed me an upscale tutorial by Olivio Sarikas [2] that clicked with me. Similarly to other tutorials, A1111 has [re?]moved stuff around between updates in the 6 months to a year since it was popular to introduce ControlNet – not to mention various plugins which may differ between our setups. Olivio’s tutorial rescued my project, and I got back to having fun cleaning up details with GIMP.

Takeaway

I may need to take a closer look at Forge instead of A1111. A1111 has a known bug where it has trouble unloading models, but while I was playing with various ControlNet models, I managed to defeat the vRAM capacity on the GPU.

Final Question

Forge will require a virtual environment, which I don’t know how to do properly yet. What tutorial would you recommend? I look forward to hearing your answers in the comments below or on my Socials!

Work Cited

[1] Not4Talent, “Create consistent characters with Stable diffusion!!,”youtube.com, Jun. 2, 2023. [Online]. Available:https://youtu.be/aBiGYIwoN_k [Accessed Jun. 7, 2024].

[2] O. Sarikas, “ULTIMATE Upscale for SLOW GPUs – Fast Workflow, High Quality, A1111.”youtube.com, May 6, 2023 [Online]. Available:https://youtu.be/3z4MKUqFEUk. [Accessed Jun. 7, 2024].

Generative AI: Ethics on the Frontier

Good Morning from my Robotics Lab! This is Shadow_8472 and today I have a few thoughts about ethics when living on a frontier. Let’s get started.

Law and New Technology

Law follows innovation. A world without motor vehicles or electricity won’t require cars to stop at a red light. Conversely, new technologies bring legal uncertainty. A nuclear-powered laptop might be ready for 20 years of abuse in an elementary classroom without leaking any radiation, but expect more courtroom pushback than a mind-reading camera – at least until the legal system can parse the respective technologies.

Generative AI in 2024 is data hungry. More training data makes for a better illusion of understanding. OpenAI’s ChatGPT-4o reportedly can read a handwritten note and display emotion in a verbal reply in real time. If they haven’t already, they will soon have a model trained off every scrap of text, video, and audio freely available as well as whatever databases they have access to. But the legal-moral question is: what is fair game?

Take drones as a recent, but more mature point of comparison. Generally speaking, drones should be welcome whereever recreational R/C aircraft already are. Hover like you might be spying on someone expecting privacy, and there might be trouble. Laws defining the boundaries between these and similar behaviors protect drone enthusiasts and homeowners alike. Before that compromise was solidified, the best anyone could do was not be a jerk while flying/complaining.

The AI Art War

But not everyone’s idea of jerk behavior is the same. Many AI trainers echo the refrain, “It’s not illegal, so we cam scrape.” Then digital artists on rough times see AI duplicating their individualized styles, and they fight back. Soon, jerks are being jerks to jerks because they’re both jerks.

Model trainers practically need automated scraping, precluding an opt-in consent model like what artists want. Trainers trying not to be jerks can respect name blacklists, but improperly tagged re-uploads sneak in anyway. Artists can use tools like Glaze and Nightshade to poison training sets, but it’s just a game of cat and mouse so far.

Those were the facts as stated as objectively as I can. My thoughts are that artists damage their future livelihood more by excluding their work from training data. The whole art market will be affected as they lose commissioners to a machine that does “good enough.” Regulars who care about authentic pieces will be unaffected. Somewhere between these two groups are would-be art forgers in their favorite style and people using AI to shop for authentic commissions. I expect the later to be larger, so the moral decision is to make an inclusive model.

At the same time, some countries have a right to be forgotten. Verbally abusing AI art jerks provides digital artists with a much-needed sense of control. While artists’ livelihoods are threatened on many sides, AI is approachable enough to attack, so they vent where they can. I believe most of the outcry is overreaction but remember I’m biased in favor of Team Technology, though I am not wholly unsympathetic to their cause. I am in favor of letting them exclude themselves, just not for the reasons they would rather hear.

Takeaway

I see the AI situation in 2024 as comparable to China’s near monopoly on consumer electronics and open secret about committing human rights violations. In theory you could avoid unethically sourced consumer goods, but often times going without is not an option. You can then see the situation as forcing you to support immoral practices or you can see yourself as making the effort to find the best –though only– reasonable option available. The same thing applies to AI. All other factors equal, I intend to continue using AI tools as my conscience allows.

Final Question

Do you disagree with my stance? Feel free to let me know in the comments below or on my Socials!

Self-Hosted AI Consistent Characters: Part 1

Good Morning from my Robotics Lab! This is Shadow_8472, and today I am on a quest to generate consistent characters using AI. Let’s get started!

It all started with wanting to learn how to make my own “consistent characters” I can summon with a keyword in my prompt. Before I can train the AI to make one, my subject needs consistent source pictures. One way to do that is to chop up a character sheet with multiple angles of the same character all generated at once. Expect all that in a future post. It sounded like a reasonable goal until I discovered just how many moving parts I needed just to approach.

In particular, my first goal is Ms. T, a Sonic OC (Original Character) by my sister, but once I figure out a successful workflow, it shouldn’t be too hard to make more.

A1111

Automatic1111 (A1111) is the go-to StableDiffusion (SD) image generation web interface for the vast majority of tutorials out there. While it’s not the easiest SD WebUI, A1111 is approachable by patient AI noobs and EasyDiffusion graduates alike. It exposes a bit many controls by default, but it packs enough power to keep a SD adept busy for a while. I also found Forge, an A1111 fork reportedly having extra features, bug fixes, grudging Linux support, and needs a virtual environment. At the top, I found ComfyUI, which lets you design and share custom workflows.

As a warm up exercise, I found a SonicDiffusion, a SD model geared for Sonic characters, generated a bunch of Ms. T portraits, and saved my favorites. Talking with my sister, I began cherry picking for details the model doesn’t control for, such as “cyclopse” designs where the eyes join at the whites vs. separate eyes (Hedgehogs are usually cyclopses, but not in the live action movies). SonicDiffusion –to my knowledge– lacks a keyword to force this distinction. Eventually, my expectations outpaced my ability to prompt, and I had to move on.

ControlNet

A major contribution to A1111’s versatility is its ecosystem of extensions. Of interest this week is ControlNet, a tool to include visual data in a StableDiffusion prompt for precision results. As of writing, I’m looking at 21 controller types – each needing a model to work. I downloaded the ones for Canny, Depth, and OpenPose to get started.

My first thought was to use an Xbox One Kinect (AKA Kinect v2) I bought from someone in my area a few Thanksgivings ago. If it works, I can easily pose for ControlNet. Long story short: I spent a couple days either last week or the week before tossing code back and forth with a self-hosted AI chatbot in SillyTavern with no dice. The open source Linux driver for the Kinect v2 just isn’t maintained for Ubuntu 22.04 and distros built on it. I couldn’t even get it to turn on its infrared LED’s (visible by my PinePhone’s camera) because of broken linkages in the header files or something. Pro tip: Don’t argue with a delusional LLM unless you can straighten it out in a reply or two. On the plus side, the AI did help me approach this job where I’d expect to have taken weeks to months without it. If/when I return, I expect to bodge it with Podman, but I may need to update the driver anyway if the kernel matters.

Even if I did get the Kinect to work, I doubt it would have been the miracle I was hoping for. Sonic style characters (Mobians) have different proportions than humans – most notably everything from the shoulders up. I ended up finding an embedding for making turnaround/character sheets, but it was again trained to make humans and I got inconsistent results compared to before. I did find a turnaround template for chibi characters that gave me OK-ish results running it through Canny, but Ms. T kept generating facing the wrong way.

On another session, I decided to try making Ms. T up in Sonic Forces. I installed it (ProtonDB: Platinum) and loaded my 100% save. I put Ms. T on a white background in GIMP and gave it to ControlNet. Unsurprisingly, OpenPose is not a Sonic fan. It’s trained on human data (now with animals!), but a cartoon kept returning blank outputs until I used a preprocessor called dw_openpose_full, which –while it still doesn’t like cartoon animal people– did cooperate on Ms. T’s right hand. Most every node else I dragged into place manually. I then demonstrated an ability to pose her left leg.

Character Sheet

From there, I opened OBS to record an .MP4 file. I used FFmpg to convert to .gif and loaded it in GIMP to… my computer slowed to a crawl, but it did comply without a major crash. I tried to crop and delete removed pixels… another slowdown, GIMP crashed. I adjusted OBS to record just my region of interest. 500+ frames was still a no-fly when each layer only has the changes from the last. I found options to record as .gif and record as slowly as I want. I then separated out my frames with FFmpg, making sure to have a new directory:

ffmpeg -i fileName.gif -vf fps=1 frame_%04d.jpg

I chose ten frames and arranged them in a 5×2 grid in GIMP. I then manually aligned OpenPose skeletons for each and sent that off to ControlNet. Immediately, my results improved. I got another big boost by using my grid of .gif frames, but in both cases Ms. T kept eyes and feet facing toward the viewer – even when her skeleton was pointed the other way. My next thought was to clean up the background on the grid, but compression artifacts got in the way.

Start over. I made a new character with joint visibility and background removal in mind. She looked ridiculous running through a level, but I got her knee placement by moving diagonal toward the camera and jumping. I then put eight new screenshots in a grid. Select-by-color had the background cleared in a minute. I then used Canny for silhouettes intending to reinforce OpenPose. I still got characters generating the wrong way.

Takeaway

This week has had a lot of interplay between study and play. While it’s fun to run the AI and cherry pick what comes out, the prospect of better consistently keeps me coming back to improve my “jungle gym” as I prepare to generate LoRa training images.

Final Question

The challenge that broke this topic into multiple parts is getting characters to face away from the viewer. Have you ever gotten this effect while making character sheets?

I look forward to hearing from you in the comments below or on my Socials!

Milestone: First Linux Phone Call

Good Morning from my Robotics Lab! This is Shadow_8472 and today I am messing around with my prototype PinePhone to see if I can’t get it on the cell network for good. Let’s get started!

My Carrier History

Around four years ago, my family had to switch away from a cellular company that let its coverage degrade. We’d been with them since I was small, but for whatever reason, they opted to wait for new technology before replacing a destroyed tower. They lost us as customers over it. I had just gotten my PinePhone at the time. I had made one short call on it.

I made an honest effort to research network compatibility and thought I had made a match, but our then-new carrier turned out to be very closed-minded about allowed 3rd party devices. I poked at it for a while, learning a little bit each time, but progress was very slow.

In recent months, the family’s phones have been succumbing to planned obsolescence. I found a carrier for my area on the PinePhone’s compatibility chart, and we made the switch.

Linux Phone Basics

Unlike phones in the Apple/Android ecosystems, Linux phones run Linux. It won’t argue if you install –say– the full version of Firefox instead of one optimized for a mobile desktop environment. While using an app store is an option, the command line is available for those who wish for a challenge on a touch screen.

I am the proud owner of a PinePhone Ubports Edition, the second prototype phone produced by Pine64. It originally came with Ubuntu Touch installed, but the experience was kind of slow. This led me to look into lightweight options, and I flashed PostmarketOS/Plasma Mobile to an SD card to explore.

Recent Developments

I finally committed. While working on another project within the past month, I installed PostmarkedOS internally. My first mistake was trying to approach this installation as a normal Linux installer. Nope. It had me configure everything from the command line. My second mistake was installing a desktop grade version XFCE. While I still had access to a terminal, the sub-compact on-screen keyboard was a crutch at best, but I used it to 1. connect to Wi-Fi, 2. update and install Plasma Mobile, and 3. remove XFCE – all while trying to get it ready to test at the new carrier’s store the next day.

The next day came, and things worked out so I could be at the store. Good thing too, because I had previously disabled my modem by a dip switch under the back cover. I also noticed a bunch of apps missing from my minimal Plasma Mobile installation, and I kept mistaking some sort of configuration app for a browser. I made the connection later.

Ultimately Plasma Mobile kept crashing, so when I went back to my SD card, I did some more research and chose Phosh (Phone Shell), an even lighter weight desktop environment developed my Purism for their Librem 5 phones. So far, no memorable crashes, but I’ve not stress tested it yet.

Access Point Name

So, I put my new SIM card into my PinePhone running PostmarketOS/Phosh, and I got intermittent signal thanks in part to a combination of only using on 4G technology and solar activity strong enough to decorate night skies across the US in aurora borealis. The catch was an error manifesting as an orange square with a black exclamation mark.

While waiting for an afternoon to help out at the church office for the afternoon, I reached out to the Pine64 community on a whim. Shortly after, a helpful user there walked me through setting up the correct Access Point Name based on my carrier. Minutes later, I received an important incoming call, and the connection held up for minutes, unlike the seconds I would get out of Plasma Mobile (Thank you, Jesus for that timing!).

Takeaway

I am thankful to have a working phone again. I still have challenges ahead, like filching apps from the Play Store using Waydroid (or a similar compatibility layer) and having a simple unlock password while using a longer password for disk encryption and administrative tasks.

Final Question

Did you get a chance to see the northern lights this time around? I look forward to hearing from you in the comments below or on my Socials!

Building Up My SillyTavern Suite

Good Morning from my Robotics Lab! This is Shadow_8472, and today I am going farther into the world of AI chat from the comfort of my own home. Let’s get started!

Last week, I got into Silly Tavern, a highly configurable AI chat playground with tons of features. Accomplishing a functional setup was rewarding on its own, but I am having my mind blown reading about some of the more advanced features. I want to explore farther. Notably: I am interested in a long term goal of posing characters with text and “taking pictures,” as well as locally powered AI web search.

Stable Diffusion

My first serious exploration into AI was image generation. Silly Tavern can have the LLM (Large Language Model) write a prompt for Stable Diffusion, then interface with tools such as Automatic1111 through an API’s (Application Program Interface) to generate an image. Similarly to the LLM engine, A1111 must be launched with the –api flag. I haven’t yet spent much time on this since getting it working.

Web Search

It is possible with a plugin to give your AI character the ability to search the Web. While historically this was done through something called the Extras API, the official documentation noted how it is no longer maintained as of last month and that most of the plugins work without it. The step I missed on both this and Stable Diffusion last week was connecting to their repository to download stuff. Anything else I tried kept getting ignored.

I configured AI search to use DuckDuckGo through Firefox. Let’s just say that while my AI search buddies appear to have a knack for finding obscure documentation, they do suffer from hallucinations when asking for exact products, so always double check the AI’s work.

A favorite interaction I had AI searching was looking up how much I probably paid for my now dying tablet (Praise God for letting me finish backing it up first!), a Samsung Galaxy Tab A 10.1 (2016). The bot said it originally sold for around $400 citing MSRP (Manufacturer’s Suggested Retail Price, which I did not know previously). I went and found the actual price, which was $50 cheaper and closer to what I remember its price tag being.

LLM Censorship

While my experience with Artificial Intelligence so far has been a fun journey of discovery, I’ve already run into certain limitations. The companies releasing LLM’s typically install a number of guardrails. I used AI to find a cell phone’s IMEI number, but Crazy Grandpa Joe might make bombs or crack with it in his son’s kitchen using common ingredients. This knowledge is legal, but the people training LLM’s don’t want to be seen as responsible for being accessories to crime. So they draw a line.

But where should they draw this line? Every sub-culture differs in values. [Social] media platforms often only allow a subset of what’s legal for more universal appeal; your .pdf giveaway of Crime This From Home! will likely draw attention from moderators to limit the platform/community’s liability before someone does something stupid with it. In the same line of reasoning, if LLM trainers wish to self-censor, then that is their prerogative. However, progressive liberal American culture doesn’t distinguish between potential for danger and danger itself. LLM’s tend to be produced under this and similar mentalities. It is no surprise then that raw models –when given the chance– are ever eager to lecture about environmental hyper-awareness and promote “safe” environments.

It gets in the way though. For example: I played a scenario in which the ruthless Princess Azula (Avatar: The Last Airbender) is after a fight. The initial prompt has her threatening to “…incinerate you where you stand…” for bonking her with a vollyball. I goaded her about my diplomatic immunity and suddenly she merely wanted my head. At, “I will find a way to make you pay for this,” I jokingly tossed her a unit of currency. It went over poorly, but she still refused to get physical. I ended up taking her out for coffee. I totally understand the reasoning behind this kind of censorship, but it made the LLM is so adverse to causing harm, it cannot effectively play a bad guy doing bad things to challenge you as the hero.

Takeaway


AI is already powerful genie. The “uncensored” LLM’s I have looked at draw their line at bomb and crack recipes, but sooner or later truly uncensored LLM’s will pop up as consumer grade hardware grows powerful enough to train models from scratch. Or perhaps by then access to precursor datasets will be restricted and distribution of such models regulated. For now though, those with the power to let technologies like LLM’s out of the AI bottle have chosen to do so slowly in the hopes we don’t destroy ourselves by the time we learn to respect and use them responsibly.

Final Question

I tested pacifist Azula against a group chat, and got them to fight normally, but the LLM I’m using (kunoichi-dpo-v2-7b) gives {user} Mary Sue grade plot armor as elaborated above. Have you found a 7B model and configuration that works for interesting results?

I’ve looked around for another LLM and read it’s one of the better ones for the hardware I’m using. I tested pacifist Azula against a few other cards in a group chat, and found that fights can happen, but it gives {user} plot armor to the point of being a Mary Sue. Have you found a LLM you like? I look forward to hearing from you in the comments below or on my Socials!