UTF-8转换为embedded式系统的unicode转换器

我有一个embedded式系统，通过UPNP获取UTF-8编码数据。显示设备具有显示字符的能力。我需要一种方法来将我通过UPNP收到的UTF-8数据转换为unicode。显示器位于PIC上，并通过运行Linux的UPNP网桥发送数据。有没有一种简单的方法来执行转换之前，我把它发送到在Linux显示板？

在linux中从c ++执行java

当文件path中有空格时，如何用“system（）”执行bash脚本？

在Windows中的Makefile

如何在c编程中使用rdtsc来估计for循环的开销

如何将CString传递给格式string％s？

如果您拥有一个真正的操作系统并托管C环境，那么最好的方法是简单地确保您的程序在使用UTF-8编码的区域中运行，并使用mbrtowc或mbtowc将UTF-8序列转换为Unicode代码点值（ wchar_t是Linux上的Unicode代码点编号，以及定义__STDC_ISO_10646__任何C实现）。

如果您想跳过系统库例程并自己进行UTF-8解码，请小心。我曾经使用谷歌代码搜索做了一个随意的调查，发现在UTF-8编码的1/3到2/3之间出现了一个危险的错误。这是一个完全正确，快速，简单的实现，我会强烈建议：

http://bjoern.hoehrmann.de/utf-8/decoder/dfa/

我在musl中的实现在二进制大小上稍微小一点，似乎更快，但是它也有点难以理解。

要将编码为UFT-8的字节数组转换为Unicode代码点数组：

诀窍是检测各种编码错误。

#include <limits.h> #include <stdio.h> #include <stdbool.h> #include <stdint.h> typedef struct { uint32_t UnicodePoint; // Accumulated code point uint32_t Min; // Minimum acceptable codepoint int i; // Index of char/wchar_t remaining bool e; // Error flag } UTF_T; static bool IsSurrogate(unsigned c) { return (c >= 0xD800) && (c <= 0xDFFF); } // Return true if more bytes needed to complete codepoint static bool Put8(UTF_T *U,unsigned ch) { ch &= 0xFF; if (U->i == 0) { if (ch <= 0x7F) { U->UnicodePoint = ch; return false; /* No more needed */ } else if (ch <= 0xBF) { goto fail; } else if (ch <= 0xDF) { U->Min = 0x80; U->UnicodePoint = ch & 0x1F; U->i = 1; } else if (ch <= 0xEF) { U->Min = 0x800; U->UnicodePoint = ch & 0x0F; U->i = 2; } else if (ch <= 0xF7) { U->Min = 0x10000; U->UnicodePoint = ch & 0x07; U->i = 3; } else { goto fail; } return true; /* More needed */ } // If expected continuation character missing ... if ((ch & (~0x3F)) != 0x80) { goto fail; } U->UnicodePoint <<= 6; U->UnicodePoint |= (ch & 0x3F); // If last continuation character ... if (--(U->i) == 0) { // If codepoint out of range ... if ((U->UnicodePoint < U->Min) || (U->UnicodePoint > 0x10FFFF) || IsSurrogate(U->UnicodePoint)) { goto fail; } return false /* No more needed */; } return true; /* More needed */ fail: U->UnicodePoint = -1; U->i = 0; U->e = true; return false /* No more needed */; } /* return 0:OK,else error */ bool ConvertUTF8toUnicodeCodepoints(const char *UTF8,size_t Length,uint32_t *CodePoints,size_t *OutLen) { UTF_T U = { 0 }; *OutLen = 0; for (size_t i = 0; i < Length;) { while (Put8(&U,UTF8[i++])) { // Needed bytes not available? if (i >= Length) { return true; } } if (Ue) break; CodePoints[(*OutLen)++] = U.UnicodePoint; } return Ue; }

这是基于一些旧的代码，请告知，因为它可能不符合当前的标准。

不是最漂亮的goto和魔术数字。

这种方法的CodePoints[(*OutLen)++] = U.UnicodePoint用于消费代码点，如果想要提取UTF16（BE或LE），可以很容易地为UTF_T块编写消费者代码，不需要改成UTF8 – > codepoint部分。

我将使用GLib（LGPL许可的实用程序库）的Unicode操作函数。这听起来像g_utf8_to_ucs4（）是你在找什么。

UTF-8转换为embedded式系统的unicode转换器

相关推荐