Why You Should Buy an AMD machine for Local LLM Inference in 2025

We’ve covered why NVIDIA consumer cards hit a 32GB wall and why Apple’s RAM pricing is prohibitive. Now let’s talk about the actual solution: AMD Ryzen AI Max+ 395 with 128GB unified memory.

This is the hardware I chose for my home LLM inference server. Here’s why.

It’s Open, Baby!

In contrast to the two big competitors NVIDIA and Apple, AMD has a huge amount of their stack open source. What CUDA is for NVIDIA and MLX for Apple, that’s ROCm for AMD. It’s fully open source, available on GitHub, and sees a huge amount of activity. This not only gives me a warm and fuzzy feeling, but also a lot of confidence that this stack will continue to go in the right direction.

The Hardware That Changes the Game

AMD Ryzen AI Max+ 395 offers something unique in the prosumer market:

128GB of fast unified memory (96GB available to GPU)
Integrated GPU with discrete-class performance
Complete system cost: 2000-2500 Euro
Less than half the cost of the equivalent Mac Studio!

To make this more concrete: you can run a 70B model quantized to 4-bit (~38GB) and still have 50GB+ for context. That’s enough for 250K+ token contexts, legitimately long-document processing, extensive conversation history, and complex RAG workflows.

Looking a bit into the future, it’s not hard to imagine AMD shipping the system with 256 gigabytes of RAM for a reasonable price. It’s very hard to imagine Apple shipping a 256 gigabytes machine for a reasonable price. It’s just how they make their money.

Comparison to the DGX Spark

The recently released DGX Spark is a valid competitor to AMD’s AI Max series. It also features 128GB of super unified memory. From a pure hardware value perspective, the NVIDIA DGX Spark is the most compelling alternative on the market in October 2025. Street price is around 4500 Euro right now, almost double. You get a beautiful box with very comparable hardware and better driver support. You even get a good starting point to do your first experiments, like downloading LLMs and training your model. But everything you build on is closed source. You’re 100% dependent on NVIDIA staying on top of the game, on a machine that doesn’t make a lot of money for NVIDIA. I’m not that optimistic.

With the recent explosion of speed and everything in software with the help of coding agents, I’m not confident any company can stay on top of all of that. Especially not a company that earns their biggest profits in this sector.

Also the NVIDIA DGX Spark is Arm-based, which isn’t a problem for inference and training, but for another use case which is becoming important.

Running Apps and LLMs Side by Side

If you are doing LLM inference on a local machine, the easiest setup is to also run the apps needing the inference on the same machine. Running two machines is possible but opens a huge can of worms of problems. Even though it might not make sense intuitively, such distributed systems are complex. Not double complex, more like exponentially complex. Here’s a golden question from 10 years ago on Stackoverflow, trying to explain it.

So running everything on one machine is much simpler. With AMD you’re staying on the most common CPU architecture available x86-64. With the DGX Spark, you’re in Arm land. This architecture is gaining traction, but still a far way from being universally supported. If you’re planning to experiment with a lot of small open source dockerized apps like I do, this is a big plus for the AMD route.

The Driver Reality

This is the real trade-off: AMD’s software support lags behind NVIDIA and Apple by 1-3 months for bleeding-edge models.

As we discussed in our Qwen3-Next case study:

vLLM doesn’t officially support gfx1151 (the Ryzen AI 395’s GPU architecture) yet
For architecturally novel models, you’re waiting on llama.cpp implementations
ROCm 7.0 works well for established models, but cutting-edge architectures take longer

Important context: This is about bleeding-edge model support, not general capability. I run Qwen3 32B, Llama 3.1 70B, DeepSeek, and multimodal models without issues. The hardware is capable, the ecosystem just needs time to catch up. When and if AMD really catches up is unknown. I just want to make clear it’s a bet.

Why Not Regular AMD GPUs?

Before we conclude, let’s address another obvious question: what about regular AMD GPUs?

AMD Radeon AI PRO R9700 (32GB) or similar:

Consumer price point (1400 Euro)
32GB VRAM
Same problem as NVIDIA consumer cards, but cheaper

These cards face the same memory ceiling as NVIDIA consumer cards. Yes, driver support has improved significantly with ROCm 6.x and 7.0. But you’re still dealing with the fundamental limitation. They’re cheaper, so you can stack them together, like Level1Techs does.

Two reasons speak against this: First, you’re building a highly custom machine, with all sorts of compatibility issues. Second, with 300W each, this is a huge power draw.

Conclusion

The Ryzen AI Max+ 395 is special because it’s the only prosumer-priced hardware offering 128GB of unified memory accessible to the GPU, coming in a standardized package with decent energy efficiency.

Previously: Why you shouldn’t buy an NVIDIA GPU and Why you shouldn’t buy into the Apple ecosystem.

This concludes our three-part hardware series. The message is simple: 128GB unified memory at a reasonable price changes everything for local LLM inference, and right now, AMD is the only one delivering that.