如何根据使用的字体将代码点映射到unicode字符？

客户打印标签，并使用一组符号（？）字体来做到这一点。应用程序使用单字节数据库（Oracle with Latin-1）。我正在replace的旧应用程序不支持Unicode。不知何故，确定。我正在写的replace应用程序应该处理旧数据。

从charmap应用程序中挑选的符号通常映射到特定的Unicode字符，但有时它们不会。例如，使用LAB3字体的月亮看起来就像U + 2014（EM DASH）。当用户将此字符粘贴到Swing文本字段中时，该字符的代码点为8212. 它被“移动”到专用区（由Windows？Java？提供）。当将这个字符保存到数据库时，Oracle决定不能安全地编码，并用可怕的¿将其replace。于是，我开始将字符移动8000： – = 8000，当显示字段时+ 8000。不幸的是，我发现其他angular色没有相同的数量。例如，在一个特定的字体中，ž的代码点为382，所以我将它移动了+/- 256来“修复”它。

现在我害怕发现更奇怪的偏移，我想知道： 我可以使用Java获得这个映射吗？ 也许TTF字体有一个它编码的255个字形列表，以及那些对应的Unicode字符，我可以做到“正确”？

现在我正在使用以下kludge：

@ font-face：为什么在Windows上这个单独的字体看起来不一样？

查看其他应用程序的字体

远东语言文本不在Windows Compact 7中绘制

使用wkhtmltoimage和wkhtmltopdf生成带有自定义字体的图像和PDF（Linux centos6 64位，hostgator）

Mac与Windows浏览器字体高度渲染问题

static String fromDatabase(String str,String fontFamily) { if (str != null && fontFamily != null) { Font font = new Font(fontFamily,Font.PLAIN,1); boolean changed = false; char[] chars = str.toch ararray(); for (int i = 0; i < chars.length; i++) { if (font.candisplay(chars[i] + 0xF000)) { // WE8MSWIN1252 + WinXP chars[i] += 0xF000; changed = true; } else if (chars[i] >= 128 && font.candisplay(chars[i] + 8000)) { // WE8ISO8859P1 + WinXP chars[i] += 8000; changed = true; } else if (font.candisplay(chars[i] + 256)) { // ž in LAB1 Eastern = 382 chars[i] += 256; changed = true; } } if (changed) str = new String(chars); } return str; } static String toDatabase(String str,String fontFamily) { if (str != null && fontFamily != null) { boolean changed = false; char[] chars = str.toch ararray(); for (int i = 0; i < chars.length; i++) { int chr = chars[i]; if (chars[i] > 0xF000) { // WE8MSWIN1252 + WinXP chars[i] -= 0xF000; changed = true; } else if (chars[i] > 8000) { // WE8ISO8859P1 + WinXP chars[i] = (char) (chars[i] - 8000); changed = true; } else if (chars[i] > 256) { // ž in LAB1 Eastern = 382 chars[i] = (char) (chars[i] - 256); changed = true; } } if (changed) return new String(chars); } return str; }

Pango / Cairo / Fontconfig：在Windows中使Unicode字体回退/链接正常工作

从C ++ / Windows中获取名称和样式的字体文件path

BIRT fontsConfig.xml字体问题

Eclipse符号的问题§

Segoe UIselect：Segoe UI不支持亚洲语言

字体文件肯定有从Unicode到字形的映射。不幸的是，这个雕文是完全武断的，不需要和它所映射的角色有任何关系，就像你在月亮/魔法冲刺中找到的一样。从单字节字符到Unicode代码点的映射可以在Windows代码页1252中找到，但这不是你想要的 – 你希望字符0x97等于月亮字形，比如☽FirsT QUARTER MOON U + 263D比- EM DASH U + 2014 。不幸的是，我不能建议任何东西，而不是去字体中的每个字符，并将其与可用的Unicode字符进行比较。

只是为了提供关闭，这似乎工作：

static String fromDatabase(String str,String fontFamily) { if (str != null && fontFamily != null) { try { byte[] bytes = str.getBytes("ISO-8859-1"); // database encoding if (fontFamily.startsWith("LAB")) { str = new String(bytes,"Windows-1252"); } } catch (UnsupportedEncodingException e) { e.printstacktrace(); } } return str; } static String toDatabase(String str,String fontFamily) { if (str != null && fontFamily != null) { try { if (fontFamily.startsWith("LAB")) { str = new String(str.getBytes("Windows-1252"),"ISO-8859-1"); } } catch (UnsupportedEncodingException e) { e.printstacktrace(); } } return str; } public void exportFormData(Eigentumsbezeichnung@R_389_4045@ionFormData formData) throws ProcessingException { super.exportFormData(formData); formData.getWert().setValue(toDatabase(formData.getWert().getValue(),formData.getSchrift().getValue())); } public void importFormData(Eigentumsbezeichnung@R_389_4045@ionFormData formData) throws ProcessingException { super.importFormData(formData); getWertField().setValue(fromDatabase(formData.getWert().getValue(),formData.getSchrift().getValue())); }

这里是我的解释：数据库使用ISO 8859-1（aka。Latin 1）。十年前，客户委托了一些特意字体，故意说它们是拉丁语-1编码，但实际上它们显示不同的字符。注意这个例子中的欧米茄如何代替Ù！

此外，还使用了Latin-1未使用的一些字符。最简单的解决方案似乎认为正在使用相应的Windows代码页。这允许Java将数据库“假Windows-1252”中的字节转码为Unicode并返回。翻译后，Swing应用程序将使用显示欧米茄字形的字体显示Ù。问题解决了。”

如何根据使用的字体将代码点映射到unicode字符？

相关推荐