The Case for TreeOS, Part 1

In December, I released the first beta version of my small operating system, TreeOS. It is freely available on GitHub and is accompanied by a B2B solution called OnTree.co that lets you run internal or private AI-powered apps and skills 100% locally.

I am not happy with the software quality yet, but looking back, this is not surprising. When I started working with agents, it made sense to start with the most extreme approach: writing the entire code base with agents. Unsurprisingly, agentic engineering is very different from classical software engineering. I have learned a lot, as you can see in the result. I summarized my learnings in a conference talk I held yesterday in Berlin, and probably in one of our upcoming Agentic Coding Meetups in Hamburg as well. I also have a much clearer picture of how to improve the code base.

This is the first in a series about the reasons that led me to build TreeOS.

The LLM Gap

There is a gap between what current computers and smartphones can do locally and what is possible in the cloud with massive GPUs. Here is a rough classification:

Mobile models (2B-8B)

There are some excellent models in this size class, often highly specialized for one use case. For example, I use these models for speech-to-text or text-to-speech locally. Modern smartphones have up to 16 GB of memory, so the operating system and these models can run in parallel. From a battery perspective, it is important that these models only run for short periods of time, or your battery drains too fast.

Local models (8B-32B)

This is the classic local model class. There are a huge number of optimized models. Users of these models typically have a powerful gaming graphics card with 16-32 GB of RAM. This allows them to use their CPU normally and run long-running tasks on the graphics card.

Cloud models (200B and up)

The exact sizes of the major cloud models are not disclosed. One of the leading open-source models is Kimi K2 Thinking, which has 1 trillion parameters. This equates to a requirement of roughly 250 GB of RAM. Besides a maxed-out Mac Studio with 500 GB of RAM, I am not aware of any off-the-shelf hardware solution that can run these models locally.

The gap (32B-200B)

This gap is not receiving much attention at the moment, but I am optimistic that this will change in 2026. We have already seen that mixture-of-expert models, such as GPT-OSS 120B, are highly capable and can run at reasonable speeds. Also, computers in this class can run a 32B model and hold a lot of context. This is important for agentic workflows, where the agent must build a mental map of the problem, whether it is code-related or not. However, as this memory also needs to be stored in graphics RAM, consumer graphics cards are not suitable for these workloads.

With the AMD AI 300 and 400 series, affordable machines are now available that are ideally suited to long-running asynchronous tasks.

The first bet of TreeOS is that this niche will grow in 2026.