Peter Duniho wrote:
> Tony Johansson wrote:
>> Hi!
>>
>> Below is a simple program that is using the Comparer class to compare
>> two strings named str1 and str2.
>> If I use the 0x040A as the first argument to the CultureInfo I use the
>> traditional sort order accoding to the MSDN documentation that you can
>> find at the bottom.
>
> At the bottom of what?
>
>> The WriteLine statement in the program is writing 1 as the value
>> meaning that str1 > str2.
>> Can somebody explain how this works because the comparing is not based
>> on the ascii table ?
>
> What do you want to know? If you want all the gory details of the
> comparison, you need to just look at the implementation (which may or
> may not involve diving into the unmanaged Windows API).
>
> The basic answer is: duh, of course a culture-specific comparison must
> not be based on the ASCII character values. That's the whole point of a
> culture-specific comparison, as ASCII is itself not a culturally-based
> character encoding.
>
> Instead, when you do a culture-specific comparison, it uses whatever
> ordering rules exist for that specific culture. Humans being the kind
> of animal they are, these rules aren't always logical. Even when they
> are logical, the logic does not necessarily follow the representation of
> characters and words as found in a computer.
>
> But, those rules _are_ what a human being expects when the computer is
> asked to order the input, which is the whole reason for having
> culture-specific support in various APIs, including .NET.
>
>> I mean if we use the normal ascii table we would have said that str1 <
>> str2 because the letter l is less then u.
>
> The 0x040A LCID is not even listed on the reference that I looked at
> (http://msdn.microsoft.com/en-us/goglobal/bb896001.aspx). But, we can
> see on the documentation for the CultureInfo class that it's used to
> indicate a "traditional" Spanish-specific sorting.
>
> And for whatever reason (I don't speak Spanish, so I couldn't tell you
> why), the word "llegar" is alphabetized after "lugar". So that's what
> the Compare() method tells you when you compare them.
>
> If you want to know why in the "traditional" ordering, "llegar" comes
> after "lugar", but in the "international" ordering, it comes before, you
> need to ask someone who knows about Spanish culture. It's not a
> programming question.
The Spanish alphabet is, officially, a, b, c, ch, d, e, f, g, h, i, j,
k, l, ll, m, n, �, o, p, q, r, s, t, u, v, w, x, y, z. The digraph "ll"
which has its own pronunciation distinct from that of "l", has been
treated as a single letter, in the same way as the digraph "ch".
However, a 1994 international language reform passed during the Tenth
Congress of the Association of Spanish Language Academies decreed that
henceforth, for purposes of sorting, "ch" and "ll" should be treated as
two separate letters, so the official order would now be {llegar,
lugar), despite the fact that "llegar" is still officially considered to
consist of five letters. Weird, but official, and perhaps enacted in
order to avoid the kinds of problems involved in international,
computerized data exchange, given that everybody, Spanish speakers
included, *types* "ch" and "ll" each as a sequence of two letters
instead of as a digraph.
|