Wednesday, April 24, 2024

ORAEC's New Unicode Texts

Hi folks. As you know, our focus is on Unicode. Our first blogs were about converting reusable digital data of Egyptian texts to Unicode and publishing recommendations for using Unicode hieroglyphs. Among other things, we took the great repository of JSesh texts and converted them to Unicode. In the meantime, a lot has happened in this repository: new texts have been added and existing texts have been licensed under a free license. So we decided to update it. In this blog we explain the results:

The repository formerly-mdc-now_unicode now contains 16 new texts, namely:

We would like to thank the authors Kaan Eraslan, Émil Joubert, R. Monfort, S. Rosmorduc for encoding the new data and Serge Rosmorduc for publishing it in the reusable repository.

As expected, the MdC data of these texts contain a number of encodings that we have not yet encountered in the transformation to Unicode, such as O42C or W17D. In most cases, these are variants. However, according to Unicode principles and our recommendations, the actual characters should be encoded. Accordingly, we have extended our mapping so that an O42C is converted to đ“ŠŹ.

In addition, there are some codes that represent characters that cannot yet be represented in Unicode. We had already compiled some characters in our transformation at that time. We offer here the cases of missing Unicode characters resulting from this transformation:


No comments:

Post a Comment