Versions:

  • 10.0.1
  • 10.0.0
  • 9.4.1
  • 9.4.0
  • 9.3.4
  • 9.3.3
  • 9.3.2
  • 9.3.1
  • 9.3.0
  • 9.2.0
  • 9.1.4
  • 9.1.3
  • 9.1.2
  • 9.1.1
  • 9.1.0
  • 9.0.8

Lemonade Server 10.0.1, published by AMD, is a lightweight server implementation that exposes local large-language-model inference through the familiar OpenAI-compatible REST API, enabling any application originally written for cloud endpoints to redirect its calls to private, on-device compute. Conceived for developers who require low-latency, offline generative capabilities, the utility offloads execution to available GPUs or NPUs, turning a workstation into a self-contained LLM backend without subscription fees or external data transit. Typical deployments include plugging the server into coding assistants, chat front-ends, or automation scripts so that prompts are answered locally, keeping proprietary prompts on-premise and eliminating network round-trips. The package also suits researchers who need repeatable, sandboxed benchmarking across model families, as well as OEMs prototyping AI features inside embedded Windows environments. Although the current stable release is 10.0.1, the lineage spans sixteen numbered builds, each refining memory footprint, token throughput, and silicon-specific kernels for Radeon and Ryzen AI accelerators. The software belongs to the “Developer Tools / Machine Learning Servers” category and installs as a headless Windows service that listens on a configurable port, delivering chat, completion, and embedding endpoints that conform to the OpenAI specification. Once launched, compatible clients only need to change their base URL to begin using quantized models stored on the same machine. Lemonade Server is available for free on get.nero.com, with downloads provided via trusted Windows package sources such as winget, always supplying the latest version and supporting batch installation alongside other applications.

Tags: