Show HN: Shoggoth Mini – A Soft Tentacle Robot Powered by GPT-4o and RL
In a world where robots are increasingly capable of performing tasks with ease, it's easy to forget that true expressiveness is key to making them feel truly alive. In an effort to push the boundaries of robotics, I've been working on building Shoggoth Mini – a soft tentacle robot powered by GPT-4o and reinforcement learning (RL). This post retraces my journey, from happy accidents to what I learned about building robots.
The Challenge: Creating a Testbed for SpiRobs
My journey began with creating a testbed to explore the control of SpiRobs – soft tentacle robots that already felt oddly alive. I started with a simple setup, consisting of a plate to hold three motors and a dome to lift the tentacle above them. However, halfway through 3D printing, I ran out of black filament and had to finish the dome in grey. This resulted in a design that looked like it had a mouth, which my flatmate jokingly added eyes to.
This accident would become the form factor for Shoggoth Mini. By mounting stereo cameras on the dome, I was able to track the tentacle and focus attention even more. However, this setup required me to fix a issue with constant cable tension, which could make the cables leave the spool and tangle around the motor shafts.
Adding Simple Solutions to the Design
To address this issue, I added simple spool covers that eliminated most tangles and made iteration dramatically faster. Another key step was adding a calibration script and pre-rolling extra wire length. This made it possible to reproduce consistent behavior across different sessions.
Shoggoth Mini also faced another challenge – sagging under its own weight, which makes consistent behavior hard to reproduce. To address this, I thickened the spine just enough to prevent sag, but not so much that it would deform permanently under high load.
The Power of GPT-4o
With the hardware ready, I moved on to feeling how the tentacle moved. To simplify control, I reduced the tentacle's three tendon lengths down to two intuitive dimensions that can be manipulated with a trackpad. This simple 2D-to-3D mapping became the backbone of the entire system and was reused by all automated control policies.
GPT-4o played a crucial role in enabling Shoggoth Mini to interact with its environment. Its real-time API streams audio and text, allowing the robot to continuously listen to speech and detect high-level visual events. This enables GPT-4o to decide on zero-shot which low-level API calls to make.
Perception: The Key to Understanding
For hands, I used MediaPipe for hand tracking, while for the tentacle tip, I collected a dataset across varied lighting, positions, and backgrounds using k-means clustering. Roboflow's auto-labeling and active learning sped up annotation, and I augmented the dataset synthetically by extracting tentacle tips via the Segment Anything demo.
The final calibration step used a DeepLabCut notebook to compute camera intrinsics and extrinsics, enabling 3D triangulation of the tentacle tip and hand positions. Programming open-loop behaviors for soft robots is uniquely hard due to their unpredictable behavior.
Reinforcement Learning: The Final Piece
To overcome this challenge, I turned to reinforcement learning (RL), starting with a policy that would follow a user's finger. This came from an old idea I've always wanted to make – a robotic wooden owl that follows you with its big eyes.
I recreated SpiRobs in MuJoCo and set up a target-following environment with smooth, randomized trajectories. I used PPO with a simple MLP and frame stacking to provide temporal context, adding dynamics randomization to improve sim-to-real transfer.
Lessons Learned
One thing I noticed toward the end is that, even though the robot remained expressive, it started feeling less alive. Early on, its motions surprised me: I had to interpret them, infer intent. But as I internalized how it worked, the prediction error faded.
Expressiveness is about communicating internal state. However, perceived aliveness depends on something else – unpredictability, a certain opacity. This raises a question: do we actually want to build robots that feel alive? Or is there a threshold, somewhere past expressiveness, where the system becomes too agentic, too unpredictable to stay comfortable around humans?
Looking forward, I see several short-term paths worth exploring – fork the repo, build your own, or get in touch if you'd like to discuss robotics, RL, or LLMs! Thanks to Maxime Vidal for his brilliant creative input and guidance.