Show HN: Open-Source Multi-Modal Personal AI Assistant

In the midst of all the hype surrounding Jony Ive's new AI device, Rabbit R1's mixed reception, the defunct Humane AI pin, and other attempts to build the next breakthrough AI device, fundamental questions remain unanswered. How would such a device function and look like? Do we even need a new hardware device? Are these devices going to make your phone obsolete? How are they going to be different from Apple HomePod, Google Home, Alexa, etc?

There are so many questions with little answers. To find the answer to these questions, we have to take a step back and look at the limitations and issues with the current generation of devices first. This post explores those gaps and shares my journey building Ubo Pod, an open-source hackable personal AI assistant designed to address them.

Big tech companies develop devices in secrecy for years without user input, often creating products misaligned with real needs. While software can be updated post-launch, hardware design decisions are permanent once in production. This inflexibility likely explains the Humane AI Pin's failure—early user testing might have prevented building something nobody wanted.

Many of us have seen ads for topics we only discussed privately at the dinner table, making us wonder which device was listening. Since both hardware and software are closed-source, it is very difficult to verify manufacturers' privacy claims both in hardware and software. One must either trust the brand or hope security researchers will attempt to verify them through reverse engineering.

Big tech devices are typically closed source, preventing tech-savvy users from customizing core functionality. This lack of hack-ability leaves devices dependent on manufacturer updates and unable to evolve with users' specific needs.

We feed personal data to AI models, which companies use to improve their services and increase user engagement. However, you don't own the models trained on your data—this deliberate design locks you into specific providers.

What if instead, your personalized AI model stayed on your device and remained fully portable? Most connected devices become nearly useless without internet, limiting their value in areas with poor connectivity like RVs, boats, or off-grid locations. This dependence on cloud services, combined with closed source design, also raises significant privacy and security concerns.

The current devices' SDKs are too opinionated, language and platform specific. For example, devices like Alexa and Apple HomePod do offer SDK such as HomeKit and Alexa Skills Kit (ASK). They also impose a rigid boundary on what developers can and cannot do.

Current AI assistants only perform pre-defined functions through command templates mapped to specific actions. If you request something outside their programmed skills, they can't help—requiring large developer teams to build functionality for every possible use case.

Modularity and Customization

AI assistants have traditionally been voice-only, with vision recently added to some devices. However, additional sensors could enable more operational modes and context awareness.

Existing devices use one agent triggered by a wake word, lacking the ability to define multiple specialized agents with their own triggers. Multi-agent systems could offer domain-specific experts fine-tuned for narrower task sets.

Proprietary devices are tied to their company's cloud services (Alexa uses AWS, Google Home uses Google's services), creating vendor lock-in and the risk of service discontinuation bricking devices. Some also require monthly subscriptions.

Devices designed by big tech companies are limited to the targeted specific hardware platform. There is often zero flexibility to DIY your hardware or choose a different hardware platform.

C consumer devices typically last 3-5 years before disposal due to limited upgrade options—a profit-driven design that generates significant waste.

Ubo Pod: A Hackable and Open Source Personal AI Assistant

Frustrated with proprietary assistant devices and struggle of building polished UX on non-mobile/desktop platforms, I decided to build Ubo Pod. What started as a Raspberry Pi UX layer evolved into a multi-modal AI assistant platform for developers.

Users simply add API keys (obtained from prospective service providers) through our web interface, while Ubo Pro 5 supports local, cloud-free models with extensive customization options. Developers can do shadow and deep customizations depending on their level of comfort.

Key Features

  • Modularity: Our design principle allows for modularity in both hardware and software, making it easier to update and repair devices without creating waste.
  • Persistence: Local AI models allow users to keep their personalized data on the device, never leaving the device.
  • Hardware Acceleration (Available on Pi 5 only): Dedicated AI accelerators from Google Coral and Hailo enable faster processing of multiple video and audio streams in parallel.
  • Multi-Agents: Each agent can have its own wake word or trigger mechanism, enabling multi-modal agents that take text, audio, video/image, sensor readings as inputs.

The Future of AI Assistants

We are launching a Kickstarter campaign to bring this device to mass production. Right now we have a small inventory from our pilot production run which we use for beta testing and getting user feedback.

You can reserve yours for $1 or sign up on pre-launch campaign to be amongst the very first to get your Ubo Pod. The following services would be supported out-of-the-box:

  • AssemblyAI, Amazon Transcribe, Azure, Cartesia, Deepgram, Fal Wizper, Gladia, Google Cloud, Groq (Whisper), OpenAI Whisper (Local), Parakeet NVIDIA (Local), Picovoice Cheetah (Local), Vosk (Local), Ultravox Anthropic, AWS, Azure, Cerebras, DeepSeek, Fireworks AI, Gemini, Grok, Groq, NVIDIA NIM, Ollama, OpenAI, OpenRouter, Perplexity, Qwen, Together AI
  • AssemblyAI, Amazon Transcribe, Azure, Cartesia, Deepgram, ElevenLabs, FastPitch NVIDIA (Local), Fish (Local), Google, LMNT, MiniMax, Neuphonic, OpenAI, Piper (Local), PlayHT, Rime, Sarvam, XTTS (Local)
  • AWS Nova Sonic, Gemini Multimodal Live, OpenAI Realtime Daily (WebRTC), FastAPI Websocket, SmallWebRTCTransport, WebSocket Server, gRPC, Local Silero VAD (Local), Krisp, Picovoice Koala (Local), Noisereduce (Local)

The diagram below shows Ubo software architecture. A more in-depth article on this will be published soon.

The official core software is already open-sourced on GitHub. You can also open a discussion on your GitHub repo to discuss any specific topic you have in mind.

Below you can links to various design files for this project.

GitHub Design Files