"…mais ce serait peut-être l'une des plus grandes opportunités manquées de notre époque si le logiciel libre ne libérait rien d'autre que du code…"

UTF-8/Unicode

  • http://www.unicode.org/faq/unicode_web.html
  • http://jimmyg.org/work/code/stringconvert/index.html (‘Any application you are working with should deal with Unicode strings internally. You should never work with ordinary Python strings because as soon as someone enters a non-ASCII character in your application it is likely to break in an unpredictable way because ordinary 8-bit Python strings can’t handle these characters. Best practice is to always decode strings to Unicode objects from whatever encoding they are in (often UTF-8) as soon as they enter your application. You then work with Unicode throughout your application and then encode the Unicode back to whatever is needed (again often UTF-8) as the string leaves your application‘)
  • http://www.tbray.org/ongoing/When/200x/2003/04/06/Unicode (‘Quite a few software professionals have learned that they need to worry about internationalizing software, and some of those have learned how to go about doing it. For those getting started, herewith a brief introduction to Unicode, the one technology that you have to get comfortable with if you’re going to do a good job as a software citizen of the world.’)
  • http://pylonsbook.com/en/1.1/unicode.html (‘If you’ve ever come across text in a foreign language that contains lots of question mark characters in unexpected positions or if you’ve written Python code that causes an exception such as the following one to be raised, then chances are you have run into a problem with character sets, encodings, and Unicode…Encoding Unicode characters with a variable number of bytes for each character as UTF-8 has an interesting side effect. It means that UTF-8 encoded Unicode for the characters represented by the ASCII character set has the same binary representation as ASCII itself. This means computers can treat UTF-8 encoded Unicode as ASCII without any errors being raised as long as characters used are in the first 128 Unicode code points. This explains why your application might already be working perfectly well with certain Unicode strings even though you haven’t made a special effort to work with any character set except ASCII. This is also why as soon as a character such as £ or é is entered, the application will break because these are not ASCII characters; therefore, treating their UTF-8 encoded versions as ASCII will cause the kind of UnicodeDecodeError shown at the start of the chapter.’)
  • http://diveintopython3.org/strings.html (‘…Western European languages like French, Spanish, and German have more letters than English. Or, more precisely, they have letters combined with various diacritical marks, like the ñ character in Spanish. The most common encoding for these languages is CP-1252, also called “windows-1252” because it is widely used on Microsoft Windows. The CP-1252 encoding shares characters with ASCII in the 0–127 range, but then extends into the 128–255 range for characters like n-with-a-tilde-over-it (241), u-with-two-dots-over-it (252), &c. It’s still a single-byte encoding, though; the highest possible number, 255, still fits in one byte…Unicode is a system designed to represent every character from every language. Unicode represents each letter, character, or ideograph as a 4-byte number. Each number represents a unique character used in at least one of the world’s languages…’)
  • http://rishida.net/scripts/uniview/ (from http://simonwillison.net/2009/Dec/15/unicode/ : ‘Fantastically useful tool to convert strings of characters in to every unicode and/or escaping syntax you can possibly imagine.’)
    • http://rishida.net/scripts/uniview/help.html (‘UniView is an XHTML-based application to look up characters, character blocks, paste in and discover unknown characters, store your own info about characters, search on character data, do hex/dec/ncr conversions, highlight character types, etc. etc. It supports Unicode 5.2 (beta) and is written with Web Standards to work on a variety of browsers‘)
  • http://www.stereoplex.com/two-voices/python-unicode-and-unicodedecodeerror (‘In the years I’ve been developing in Python, Unicode seems to be the topic which causes the greatest amount of confusion amongst developers. Hopefully much of this confusion should go away in Python 3, for reasons I’ll come to at the end; but until then, the UnicodeDecodeError is the bane of many developers’ lives.’)

Laisser un commentaire

Entrez vos coordonnées ci-dessous ou cliquez sur une icône pour vous connecter:

Logo WordPress.com

Vous commentez à l'aide de votre compte WordPress.com. Déconnexion / Changer )

Image Twitter

Vous commentez à l'aide de votre compte Twitter. Déconnexion / Changer )

Photo Facebook

Vous commentez à l'aide de votre compte Facebook. Déconnexion / Changer )

Photo Google+

Vous commentez à l'aide de votre compte Google+. Déconnexion / Changer )

Connexion à %s

 
%d blogueurs aiment cette page :