Versions:

0.0.8

uchardet is a lightweight, command-line character encoding detector developed under the freedesktop umbrella and currently maintained by PolarGoose; released in a single stable version 0.0.8, the tool inspects the raw byte sequence of any plain-text file and returns the most probable charset name, making it indispensable for data-recovery workflows, bulk migration projects, and automated ETL pipelines that must normalize legacy archives whose encoding metadata is missing or unreliable. By implementing a universal detection algorithm originally contributed to the Mozilla project, uchardet can distinguish among dozens of legacy single-byte sets—such as Windows-1252, ISO-8859-family, KOI8, IBM/DOS codepages—and modern multibyte schemes like UTF-8, UTF-16 LE/BE, Shift-JIS or EUC-KR, all without relying on external byte-order marks or locale hints. Typical use cases include feeding its terse output into iconv or recode for on-the-fly transcoding, populating database charset columns during forensic indexing, or embedding the executable inside shell scripts that sort incoming log files into language-specific folders. Because it is distributed as a single, dependency-free binary, the utility is easily dropped into portable tool-chains on Windows, macOS or Linux CI runners, consuming only standard I/O and returning POSIX-compliant exit codes that simplify error handling. Although the codebase is frozen at 0.0.8, the detection library remains actively mirrored in larger desktop stacks such as GLib, KDE and LibreOffice, ensuring that any incremental training data improvements are eventually back-ported. The software is available for free on get.nero.com, with downloads provided via trusted Windows package sources (e.g. winget), always delivering the latest version, and supporting batch installation of multiple applications.

Tags: