Why does wprintf transliterate Russian text in Unicode into Latin on Linux?

linux小助理

1人浏览 · 2022-09-05 00:00:56

linux小助理 · 2022-09-05 00:00:56 发布

Answer a question

Why does the following program

#include <stdio.h>
#include <wchar.h>

int main() {
  wprintf(L"Привет, мир!");
}

print "Privet, mir!" on Linux? Specifically, why does it transliterate Russian text in Unicode into Latin as opposed to transcoding it into UTF-8 or using replacement characters?

Demonstration of this behavior on Godbolt: https://godbolt.org/z/36zEcG

The non-wide version printf("Привет, мир!") prints this text as expected ("Привет, мир!").

Answers

Because conversion of wide characters is done according to the currently set locale. By default a C program always starts with a "C" locale which only supports ASCII characters.

You have to switch to any Russian or UTF-8 locale first:

setlocale(LC_ALL, "ru_RU.utf8"); // Russian Unicode
setlocale(LC_ALL, "en_US.utf8"); // English US Unicode

Or to a current system locale (which is likely what you need):

setlocale(LC_ALL, "");

The full program will be:

#include <stdio.h>
#include <wchar.h>
#include <locale.h>

int main() {
  setlocale(LC_ALL, "ru_RU.utf8");
  wprintf(L"Привет, мир!\n");
}

As for your code working as-is on other machines - this is due to how libc operates there. Some implementations (like musl) do not support non-Unicode locales and thus can unconditionally translate wide characters to an UTF-8 sequence.