FreeComputerBooks.com
Links to Free Computer, Mathematics, Technical Books all over the World
|
|
- Title: Programming with Unicode
- Author/Editor(s) Victor Stinner
- Publisher: Self-Publishing; Internet Archive;
- Hardcover/Paperback: N/A
- eBook: HTML, PDF, ePub, Kindle, etc.
- Language: English
- ISBN-10: N/A
- ISBN-13: N/A
- Share This:
Unicode is the nightmare of many developers (and users) for different, and sometimes good reasons.
In the 1980’s, only few people read documents in languages other their mother tongue and English. A computer supported only a small number of languages, the user configured his region to support languages of close countries. Memories and disks were expensive, all applications were written to use byte strings using 8 bits encodings: one byte per character was a good compromise.
Today with the Internet and the globalization, we all read and exchange documents from everywhere around the world (even if we don’t understand everything). The problem is that documents rarely indicate their language (encoding), and displaying a document with the wrong encoding leads to a well known problem: mojibake.
It is difficult to get, or worse, guess the encoding of a document. Except for encodings of the UTF family (coming from the Unicode standard), there is no reliable algorithm for that. We have to rely on statistics to guess the most probable encoding, which is done by most Internet browsers.
Unicode support by operating systems, programming languages and libraries varies a lot. In general, the support is basic or non-existent. Each operating system manages Unicode differently. For example, Windows stores filenames as Unicode, whereas UNIX and BSD operating systems use bytes.
Mixing documents stored as bytes is possible, even if they use different encodings, but leads to mojibake. Because libraries and programs do also ignore encode and decode warnings or errors, write a single character with a diacritic (any non-ASCII character) is sometimes enough to get an error.
Full Unicode support is complex because the Unicode charset is bigger than any other charset. For example, ISO 8859-1 contains 256 code points including 191 characters, whereas Unicode version 6.0 contains 248,966 assigned code points. The Unicode standard is larger than just a charset: it explains also how to display characters (e.g. leftto-right for English and right-to-left for persian), how to normalize a character string (e.g. precomposed characters versus the decomposed form), etc.
This book explains how to sympathize with Unicode, and how you should modify your program to avoid most, or all, issues related to encodings and Unicode. It offers specific guidance on integrating Unicode with other technologies.
About the Authors- N/A
- Programming with Unicode (Victor Stinner)
- The Mirror Site (1) - PDF, ePub, Kindle, etc.
- The Mirror Site (2) - PDF
- Programming with Unicode: A Gentle Introduction (Triangles)
-
The Unicode Cookbook for Linguists (Steven Moran, et al)
This book is a practical guide to Unicode for linguists, and programmers, who work with data in multilingual computational environments. It describes a formal specification of orthography profiles and provide recipes using open source tools.
-
The Unicode Standard (The Unicode Consortium)
This is the one book all developers using Unicode must have. The core specification gives the general principles, requirements for conformance, and guidelines for implementers. The code charts show representative glyphs for all the Unicode characters.
-
What is the Text Encoding Initiative? (Lou Burnard)
This simple and straightforward book is intended to help the beginner make their own choices from the full range of Text Encoding Initiative (TEI) options. It explains the XML technology used by the TEI in language accessible to the non-technical readers.
-
From ASCII Art to Comic Sans: Typography in the Digital Age
Offers an original vision of the history of typography and computing in the digital age, viewed through the lens of offbeat typography - shows how text is always an image that conveys meaning, and how typography has shaped modern visual and material culture.
-
The Elements of Typographic Style (Robert Bringhurst)
The typographic rules in this book aren't specific to particular software. You can apply these rules in just about any modern page-layout program or word processor - it skipped implementation issues that are especially basic or especially complicated.
:
|
|