Unicode Basic Latin and Latin-1 coverage of Western Europe

What gaps would there be in the coverage of Western European 
languages that would not be included in Unicode Basic Latin 
and Latin-1 ? 


0
NoSpam8358 (375)
11/14/2007 12:49:26 AM
vc.mfc 33608 articles. 0 followers. Follow

32 Replies
440 Views

Similar Articles

[PageSpeed] 39

"Peter Olcott" <NoSpam@SeeScreen.com> wrote in message 
news:vCr_i.2768$gd3.2472@newsfe18.lga...
> What gaps would there be in the coverage of Western European languages 
> that would not be included in Unicode Basic Latin and Latin-1 ?

The Greek alphabet.

Euro currency. 

0
ndiamond1 (258)
11/14/2007 3:11:42 AM
"Norman Diamond" <ndiamond@community.nospam> wrote in 
message news:e0KsXwmJIHA.4712@TK2MSFTNGP04.phx.gbl...
>
> "Peter Olcott" <NoSpam@SeeScreen.com> wrote in message 
> news:vCr_i.2768$gd3.2472@newsfe18.lga...
>> What gaps would there be in the coverage of Western 
>> European languages that would not be included in Unicode 
>> Basic Latin and Latin-1 ?
>
> The Greek alphabet.
>
> Euro currency.

Anything else? 


0
NoSpam8358 (375)
11/14/2007 3:39:36 AM
Peter Olcott wrote:
:: "Norman Diamond" <ndiamond@community.nospam> wrote in
:: message news:e0KsXwmJIHA.4712@TK2MSFTNGP04.phx.gbl...
:::
::: "Peter Olcott" <NoSpam@SeeScreen.com> wrote in message
::: news:vCr_i.2768$gd3.2472@newsfe18.lga...
:::: What gaps would there be in the coverage of Western
:::: European languages that would not be included in Unicode
:::: Basic Latin and Latin-1 ?
:::
::: The Greek alphabet.
:::
::: Euro currency.
::
:: Anything else?

Central Europe.  :-)

Its isn't as easy as east and west anymore!


Bo persson


0
bop (114)
11/14/2007 6:06:42 PM
"Bo Persson" <bop@gmb.dk> wrote in message 
news:5q0rp5Fth4grU1@mid.individual.net...
> Peter Olcott wrote:
> :: "Norman Diamond" <ndiamond@community.nospam> wrote in
> :: message news:e0KsXwmJIHA.4712@TK2MSFTNGP04.phx.gbl...
> :::
> ::: "Peter Olcott" <NoSpam@SeeScreen.com> wrote in message
> ::: news:vCr_i.2768$gd3.2472@newsfe18.lga...
> :::: What gaps would there be in the coverage of Western
> :::: European languages that would not be included in 
> Unicode
> :::: Basic Latin and Latin-1 ?
> :::
> ::: The Greek alphabet.
> :::
> ::: Euro currency.
> ::
> :: Anything else?
>
> Central Europe.  :-)
>
> Its isn't as easy as east and west anymore!
>
>
> Bo persson
>
>

Meaning which languages? 


0
NoSpam8358 (375)
11/14/2007 6:31:02 PM
Peter Olcott wrote:
:: "Bo Persson" <bop@gmb.dk> wrote in message
:: news:5q0rp5Fth4grU1@mid.individual.net...
::: Peter Olcott wrote:
::::: "Norman Diamond" <ndiamond@community.nospam> wrote in
::::: message news:e0KsXwmJIHA.4712@TK2MSFTNGP04.phx.gbl...
::::::
:::::: "Peter Olcott" <NoSpam@SeeScreen.com> wrote in message
:::::: news:vCr_i.2768$gd3.2472@newsfe18.lga...
::::::: What gaps would there be in the coverage of Western
::::::: European languages that would not be included in Unicode
::::::: Basic Latin and Latin-1 ?
::::::
:::::: The Greek alphabet.
::::::
:::::: Euro currency.
:::::
::::: Anything else?
:::
::: Central Europe.  :-)
:::
::: Its isn't as easy as east and west anymore!
:::
:::
::: Bo persson
:::
:::
::
:: Meaning which languages?

Like Polish and Czech.

http://europa.eu/abc/european_countries/index_en.htm


Note that most "eastern" countries lies west of Greece.  :-)


Bo Persson


0
bop (114)
11/14/2007 7:58:33 PM
"Bo Persson" <bop@gmb.dk> wrote in message 
news:5q12arFtioo0U1@mid.individual.net...
> Peter Olcott wrote:
> :: "Bo Persson" <bop@gmb.dk> wrote in message
> :: news:5q0rp5Fth4grU1@mid.individual.net...
> ::: Peter Olcott wrote:
> ::::: "Norman Diamond" <ndiamond@community.nospam> wrote 
> in
> ::::: message 
> news:e0KsXwmJIHA.4712@TK2MSFTNGP04.phx.gbl...
> ::::::
> :::::: "Peter Olcott" <NoSpam@SeeScreen.com> wrote in 
> message
> :::::: news:vCr_i.2768$gd3.2472@newsfe18.lga...
> ::::::: What gaps would there be in the coverage of 
> Western
> ::::::: European languages that would not be included in 
> Unicode
> ::::::: Basic Latin and Latin-1 ?
> ::::::
> :::::: The Greek alphabet.
> ::::::
> :::::: Euro currency.
> :::::
> ::::: Anything else?
> :::
> ::: Central Europe.  :-)
> :::
> ::: Its isn't as easy as east and west anymore!
> :::
> :::
> ::: Bo persson
> :::
> :::
> ::
> :: Meaning which languages?
>
> Like Polish and Czech.
>
> http://europa.eu/abc/european_countries/index_en.htm
>
>
> Note that most "eastern" countries lies west of Greece. 
> :-)
>
>
> Bo Persson
>
>

Is Basic Latin and Latin-1 already provided in the fonts of 
U.S. English? 


0
NoSpam8358 (375)
11/14/2007 11:00:21 PM
"Peter Olcott" <NoSpam@SeeScreen.com> wrote in message 
news:86L_i.7281$LZ7.3823@newsfe15.lga...

> Is Basic Latin and Latin-1 already provided in the fonts of U.S. English?

ASCII is a subset of Latin-1.  If a U.S. English font is really a Latin-1 
font then it provides Latin-1.  If a U.S. English font only provides glyphs 
for ASCII then it provides half of Latin-1. 

0
ndiamond1 (258)
11/15/2007 1:07:41 AM
"Norman Diamond" <ndiamond@community.nospam> wrote in 
message news:OGs8tPyJIHA.1620@TK2MSFTNGP03.phx.gbl...
> "Peter Olcott" <NoSpam@SeeScreen.com> wrote in message 
> news:86L_i.7281$LZ7.3823@newsfe15.lga...
>
>> Is Basic Latin and Latin-1 already provided in the fonts 
>> of U.S. English?
>
> ASCII is a subset of Latin-1.  If a U.S. English font is 
> really a Latin-1 font then it provides Latin-1.  If a U.S. 
> English font only provides glyphs for ASCII then it 
> provides half of Latin-1.

What I am asking is whether or not the characters that use 
the high bit of the eight-bit byte that have associated 
glyphs can be reasonably counted on to represent the Latin-1 
character set on U.S. English computers? 


0
NoSpam8358 (375)
11/15/2007 2:06:39 AM
Euro currency is copied.  But Czech is not, and I think it still counts as Western
European.  Cyrillic is also out.  It's easy to check.  Look at something like Arial MS
Unicode and see how many characters appear beyond 0xFF
						joe

On Wed, 14 Nov 2007 19:06:42 +0100, "Bo Persson" <bop@gmb.dk> wrote:

>Peter Olcott wrote:
>:: "Norman Diamond" <ndiamond@community.nospam> wrote in
>:: message news:e0KsXwmJIHA.4712@TK2MSFTNGP04.phx.gbl...
>:::
>::: "Peter Olcott" <NoSpam@SeeScreen.com> wrote in message
>::: news:vCr_i.2768$gd3.2472@newsfe18.lga...
>:::: What gaps would there be in the coverage of Western
>:::: European languages that would not be included in Unicode
>:::: Basic Latin and Latin-1 ?
>:::
>::: The Greek alphabet.
>:::
>::: Euro currency.
>::
>:: Anything else?
>
>Central Europe.  :-)
>
>Its isn't as easy as east and west anymore!
>
>
>Bo persson
>
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
0
newcomer (15972)
11/15/2007 2:26:04 AM
I think you would have to deal with not the ASCII characters, but look at code page issues
as well.  See my Locale Explorer.
					joe

On Wed, 14 Nov 2007 20:06:39 -0600, "Peter Olcott" <NoSpam@SeeScreen.com> wrote:

>
>"Norman Diamond" <ndiamond@community.nospam> wrote in 
>message news:OGs8tPyJIHA.1620@TK2MSFTNGP03.phx.gbl...
>> "Peter Olcott" <NoSpam@SeeScreen.com> wrote in message 
>> news:86L_i.7281$LZ7.3823@newsfe15.lga...
>>
>>> Is Basic Latin and Latin-1 already provided in the fonts 
>>> of U.S. English?
>>
>> ASCII is a subset of Latin-1.  If a U.S. English font is 
>> really a Latin-1 font then it provides Latin-1.  If a U.S. 
>> English font only provides glyphs for ASCII then it 
>> provides half of Latin-1.
>
>What I am asking is whether or not the characters that use 
>the high bit of the eight-bit byte that have associated 
>glyphs can be reasonably counted on to represent the Latin-1 
>character set on U.S. English computers? 
>
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
0
newcomer (15972)
11/15/2007 3:33:13 AM
> What gaps would there be in the coverage of Western European 
> languages that would not be included in Unicode Basic Latin 
> and Latin-1 ? 

Unicode Basic Latin is 0 to 127. That is plain ASCII,
which is completely covered by Latin-1. So don't bother
mentioning Unicode in this case.

Technically, Latin-1 is ISO 8859-1, not Windows-1252.
Which means you will be missing the Euro, a problem
for all Western European languages.

Then it also depends what you mean by "Western European"
By the United Nations classification that means only
Austria, Belgium, France, Germany, Liechtenstein,
Luxembourg, Monaco, Netherlands, Switzerland 
(http://unstats.un.org/unsd/methods/m49/m49regin.htm#europe)
So you end up with German, French, Dutch, Italian,
plus "minority languages" (like Romansh, Basque, Catalan, 
Walon, Provencal, etc.)
(http://en.wikipedia.org/wiki/European_languages
and
http://en.wikipedia.org/wiki/European_Charter_for_Regional_or_Minority_Langua
ges)

No problem for any of them, if you want basic support.
But many of them will have some problems for proper support
(Latin 1 does not have smart quotes, Euro, oe ligature, n-dash, m-dash)
Then some of them might have even more problems
(Catalan uses a l with a dot, which is not present in 1252.
It is sometimes "faked" (badly) with l + a bullet (but the bullet
is not in Latin 1).

Anyway, starting something these days without proper Unicode support
is a strategic mistake.


-- 
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
0
11/15/2007 9:51:15 AM
"Mihai N." <nmihai_year_2000@yahoo.com> wrote in message 
news:Xns99E912CB28DF6MihaiN@207.46.248.16...

> Anyway, starting something these days without proper Unicode support
> is a strategic mistake.

Agreed.   As well as non-Latin script languages (like Greek), the languages 
of  Central Europe and the Baltic states cannot be represented by Latin-1. 
And Welsh cannot be represented by this or any known code-page, as it 
requires a w with a ^ on it :-)

Given that Microsoft have dropped support for win98, it seems reasonable now 
to write for the NT4-2000-XP-Vista series of operating systems where unicode 
is the natural choice.

Dave
-- 
David Webber
Author of 'Mozart the Music Processor'
http://www.mozart.co.uk
For discussion/support see
http://www.mozart.co.uk/mozartists/mailinglist.htm 

0
dave9996 (486)
11/15/2007 12:26:29 PM
"Mihai N." <nmihai_year_2000@yahoo.com>  wrote

>> What gaps would there be in the coverage of Western European
>> languages that would not be included in Unicode Basic Latin
>> and Latin-1 ?
>
> Unicode Basic Latin is 0 to 127. That is plain ASCII,
> which is completely covered by Latin-1. So don't bother
> mentioning Unicode in this case.
>
> Technically, Latin-1 is ISO 8859-1, not Windows-1252.
> Which means you will be missing the Euro, a problem
> for all Western European languages.

AFAIK Latin-9 aka ISO 8859-15 includes the Euro Symbol (€).
I'll take the chance to check my newsreader settings at this point.

Hans

0
news9141 (95)
11/15/2007 3:08:36 PM
"Mihai N." <nmihai_year_2000@yahoo.com> wrote in message 
news:Xns99E912CB28DF6MihaiN@207.46.248.16...
>> What gaps would there be in the coverage of Western 
>> European
>> languages that would not be included in Unicode Basic 
>> Latin
>> and Latin-1 ?
>
> Unicode Basic Latin is 0 to 127. That is plain ASCII,
> which is completely covered by Latin-1. So don't bother
> mentioning Unicode in this case.
>
> Technically, Latin-1 is ISO 8859-1, not Windows-1252.
> Which means you will be missing the Euro, a problem
> for all Western European languages.
>
> Then it also depends what you mean by "Western European"
> By the United Nations classification that means only
> Austria, Belgium, France, Germany, Liechtenstein,
> Luxembourg, Monaco, Netherlands, Switzerland
> (http://unstats.un.org/unsd/methods/m49/m49regin.htm#europe)
> So you end up with German, French, Dutch, Italian,
> plus "minority languages" (like Romansh, Basque, Catalan,
> Walon, Provencal, etc.)
> (http://en.wikipedia.org/wiki/European_languages
> and
> http://en.wikipedia.org/wiki/European_Charter_for_Regional_or_Minority_Langua
> ges)
>
> No problem for any of them, if you want basic support.
> But many of them will have some problems for proper 
> support
> (Latin 1 does not have smart quotes, Euro, oe ligature, 
> n-dash, m-dash)
> Then some of them might have even more problems
> (Catalan uses a l with a dot, which is not present in 
> 1252.
> It is sometimes "faked" (badly) with l + a bullet (but the 
> bullet
> is not in Latin 1).
>
> Anyway, starting something these days without proper 
> Unicode support
> is a strategic mistake.
>
>
> -- 
> Mihai Nita [Microsoft MVP, Windows - SDK]
> http://www.mihai-nita.net
> ------------------------------------------
> Replace _year_ with _ to get the real email

How can I go about supporting all languages besides the 
Asian characters? Is there software that I need to download 
so that TextOut() will display characters beyond the first 
8-Bits? How can I know exactly which Unicode characters are 
required for each Country? 


0
NoSpam8358 (375)
11/15/2007 3:13:54 PM
Look at Character Map.  It will show you the scope of each font.  If you program in
Unicode, you can get every glyph that is shown in a chosen font.  For example, I use Arial
Unicode MS which has a huge number of characters.  But regular Arial has a lot of
characters, too.  Some fonts have only the minimal 1252 set.  Others have more.  Arial has
about 1100 characters. Tahoma something over 1300, Times New Roman has the same number as
Arial, Arial Unicode MS has something like 40,000 characters.  
						joe

On Thu, 15 Nov 2007 09:13:54 -0600, "Peter Olcott" <NoSpam@SeeScreen.com> wrote:

>
>"Mihai N." <nmihai_year_2000@yahoo.com> wrote in message 
>news:Xns99E912CB28DF6MihaiN@207.46.248.16...
>>> What gaps would there be in the coverage of Western 
>>> European
>>> languages that would not be included in Unicode Basic 
>>> Latin
>>> and Latin-1 ?
>>
>> Unicode Basic Latin is 0 to 127. That is plain ASCII,
>> which is completely covered by Latin-1. So don't bother
>> mentioning Unicode in this case.
>>
>> Technically, Latin-1 is ISO 8859-1, not Windows-1252.
>> Which means you will be missing the Euro, a problem
>> for all Western European languages.
>>
>> Then it also depends what you mean by "Western European"
>> By the United Nations classification that means only
>> Austria, Belgium, France, Germany, Liechtenstein,
>> Luxembourg, Monaco, Netherlands, Switzerland
>> (http://unstats.un.org/unsd/methods/m49/m49regin.htm#europe)
>> So you end up with German, French, Dutch, Italian,
>> plus "minority languages" (like Romansh, Basque, Catalan,
>> Walon, Provencal, etc.)
>> (http://en.wikipedia.org/wiki/European_languages
>> and
>> http://en.wikipedia.org/wiki/European_Charter_for_Regional_or_Minority_Langua
>> ges)
>>
>> No problem for any of them, if you want basic support.
>> But many of them will have some problems for proper 
>> support
>> (Latin 1 does not have smart quotes, Euro, oe ligature, 
>> n-dash, m-dash)
>> Then some of them might have even more problems
>> (Catalan uses a l with a dot, which is not present in 
>> 1252.
>> It is sometimes "faked" (badly) with l + a bullet (but the 
>> bullet
>> is not in Latin 1).
>>
>> Anyway, starting something these days without proper 
>> Unicode support
>> is a strategic mistake.
>>
>>
>> -- 
>> Mihai Nita [Microsoft MVP, Windows - SDK]
>> http://www.mihai-nita.net
>> ------------------------------------------
>> Replace _year_ with _ to get the real email
>
>How can I go about supporting all languages besides the 
>Asian characters? Is there software that I need to download 
>so that TextOut() will display characters beyond the first 
>8-Bits? How can I know exactly which Unicode characters are 
>required for each Country? 
>
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
0
newcomer (15972)
11/16/2007 3:38:19 AM
"Joseph M. Newcomer" <newcomer@flounder.com> wrote in 
message news:4efnj3ts10u52blmu6seete0hdc743u7nd@4ax.com...
>I think you would have to deal with not the ASCII 
>characters, but look at code page issues
> as well.  See my Locale Explorer.
> joe
>

Thanks, and thanks again for you advice to consider COM, it 
was the ideal choice for my needs. After spending at least a 
hundred hours studying it, I found the fifteen minutes worth 
that I really needed.

> On Wed, 14 Nov 2007 20:06:39 -0600, "Peter Olcott" 
> <NoSpam@SeeScreen.com> wrote:
>
>>
>>"Norman Diamond" <ndiamond@community.nospam> wrote in
>>message news:OGs8tPyJIHA.1620@TK2MSFTNGP03.phx.gbl...
>>> "Peter Olcott" <NoSpam@SeeScreen.com> wrote in message
>>> news:86L_i.7281$LZ7.3823@newsfe15.lga...
>>>
>>>> Is Basic Latin and Latin-1 already provided in the 
>>>> fonts
>>>> of U.S. English?
>>>
>>> ASCII is a subset of Latin-1.  If a U.S. English font is
>>> really a Latin-1 font then it provides Latin-1.  If a 
>>> U.S.
>>> English font only provides glyphs for ASCII then it
>>> provides half of Latin-1.
>>
>>What I am asking is whether or not the characters that use
>>the high bit of the eight-bit byte that have associated
>>glyphs can be reasonably counted on to represent the 
>>Latin-1
>>character set on U.S. English computers?
>>
> Joseph M. Newcomer [MVP]
> email: newcomer@flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm 


0
NoSpam8358 (375)
11/16/2007 3:40:56 AM
"Joseph M. Newcomer" <newcomer@flounder.com> wrote in 
message news:m43qj3t7p0uqg3p70hf7mu36fthoovjbjh@4ax.com...
> Look at Character Map.  It will show you the scope of each 
> font.  If you program in
> Unicode, you can get every glyph that is shown in a chosen 
> font.  For example, I use Arial
> Unicode MS which has a huge number of characters.  But 
> regular Arial has a lot of
> characters, too.  Some fonts have only the minimal 1252 
> set.  Others have more.  Arial has
> about 1100 characters. Tahoma something over 1300, Times 
> New Roman has the same number as
> Arial, Arial Unicode MS has something like 40,000 
> characters.
> joe
>

I need to make my system work everywhere besides Asia. Are 
there special fonts that are defined for other locales such 
that the same variety of fonts are available for these other 
counties?  For example Old English Text MT in Russian?

> On Thu, 15 Nov 2007 09:13:54 -0600, "Peter Olcott" 
> <NoSpam@SeeScreen.com> wrote:
>
>>
>>"Mihai N." <nmihai_year_2000@yahoo.com> wrote in message
>>news:Xns99E912CB28DF6MihaiN@207.46.248.16...
>>>> What gaps would there be in the coverage of Western
>>>> European
>>>> languages that would not be included in Unicode Basic
>>>> Latin
>>>> and Latin-1 ?
>>>
>>> Unicode Basic Latin is 0 to 127. That is plain ASCII,
>>> which is completely covered by Latin-1. So don't bother
>>> mentioning Unicode in this case.
>>>
>>> Technically, Latin-1 is ISO 8859-1, not Windows-1252.
>>> Which means you will be missing the Euro, a problem
>>> for all Western European languages.
>>>
>>> Then it also depends what you mean by "Western European"
>>> By the United Nations classification that means only
>>> Austria, Belgium, France, Germany, Liechtenstein,
>>> Luxembourg, Monaco, Netherlands, Switzerland
>>> (http://unstats.un.org/unsd/methods/m49/m49regin.htm#europe)
>>> So you end up with German, French, Dutch, Italian,
>>> plus "minority languages" (like Romansh, Basque, 
>>> Catalan,
>>> Walon, Provencal, etc.)
>>> (http://en.wikipedia.org/wiki/European_languages
>>> and
>>> http://en.wikipedia.org/wiki/European_Charter_for_Regional_or_Minority_Langua
>>> ges)
>>>
>>> No problem for any of them, if you want basic support.
>>> But many of them will have some problems for proper
>>> support
>>> (Latin 1 does not have smart quotes, Euro, oe ligature,
>>> n-dash, m-dash)
>>> Then some of them might have even more problems
>>> (Catalan uses a l with a dot, which is not present in
>>> 1252.
>>> It is sometimes "faked" (badly) with l + a bullet (but 
>>> the
>>> bullet
>>> is not in Latin 1).
>>>
>>> Anyway, starting something these days without proper
>>> Unicode support
>>> is a strategic mistake.
>>>
>>>
>>> -- 
>>> Mihai Nita [Microsoft MVP, Windows - SDK]
>>> http://www.mihai-nita.net
>>> ------------------------------------------
>>> Replace _year_ with _ to get the real email
>>
>>How can I go about supporting all languages besides the
>>Asian characters? Is there software that I need to 
>>download
>>so that TextOut() will display characters beyond the first
>>8-Bits? How can I know exactly which Unicode characters 
>>are
>>required for each Country?
>>
> Joseph M. Newcomer [MVP]
> email: newcomer@flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm 


0
NoSpam8358 (375)
11/16/2007 3:47:04 AM
> AFAIK Latin-9 aka ISO 8859-15 includes the Euro Symbol (€).
> I'll take the chance to check my newsreader settings at this point.
But the question was about Latin 1, not 9.


-- 
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
0
11/16/2007 6:19:49 AM
> How can I go about supporting all languages besides the 
> Asian characters?
Unicode is your only option.
There is no single code page than can cover all the European
languages (including Greek, Cyrillic, Baltic, Turkish, EE, CE).

> Is there software that I need to download 
> so that TextOut() will display characters beyond the first 
> 8-Bits?
You don't need to download anything (if you are on Win 2000 or
newer), all the support you need is there by default.

> How can I know exactly which Unicode characters are 
> required for each Country? 
If you are Unicode, you don't have to knew.
In fact, the characters used in a language are not 100% defined.
Is e with accent required in English? 
First thought is "no", but think about "resume"
(http://en.wikipedia.org/wiki/R%C3%A9sum%C3%A9)

But thing is, you don't need to know. Does it matter if the
text is French or German when you put it on screen?
Does it matter if a French text contains a sharp s (typical
for German)? Or if a German text has a n tilde (typical
for Spanish). They are all present in Latin 1.
Similarily, when you move to Unicode, you don't care
anymore if a chertain character is only used in Turkish
or in German, they are all covered by the code page you
are using (Unicode).


-- 
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
0
11/16/2007 6:28:51 AM
"Mihai N." <nmihai_year_2000@yahoo.com> wrote in message 
news:Xns99E9E49F59A59MihaiN@207.46.248.16...
>> How can I go about supporting all languages besides the
>> Asian characters?
> Unicode is your only option.
> There is no single code page than can cover all the 
> European
> languages (including Greek, Cyrillic, Baltic, Turkish, EE, 
> CE).
>
>> Is there software that I need to download
>> so that TextOut() will display characters beyond the 
>> first
>> 8-Bits?
> You don't need to download anything (if you are on Win 
> 2000 or
> newer), all the support you need is there by default.
>
>> How can I know exactly which Unicode characters are
>> required for each Country?
> If you are Unicode, you don't have to knew.
> In fact, the characters used in a language are not 100% 
> defined.
> Is e with accent required in English?
> First thought is "no", but think about "resume"
> (http://en.wikipedia.org/wiki/R%C3%A9sum%C3%A9)
>
> But thing is, you don't need to know. Does it matter if 
> the
> text is French or German when you put it on screen?
> Does it matter if a French text contains a sharp s 
> (typical
> for German)? Or if a German text has a n tilde (typical
> for Spanish). They are all present in Latin 1.
> Similarily, when you move to Unicode, you don't care
> anymore if a chertain character is only used in Turkish
> or in German, they are all covered by the code page you
> are using (Unicode).
>

My screen reader needs to know in advance exactly which 
characters it needs to recognize to conserve memory.
   www.SeeScreen.com
Therefore I do need to know exactly which characters belong 
to each locale.

>
> -- 
> Mihai Nita [Microsoft MVP, Windows - SDK]
> http://www.mihai-nita.net
> ------------------------------------------
> Replace _year_ with _ to get the real email 


0
NoSpam8358 (375)
11/16/2007 1:12:44 PM
> My screen reader needs to know in advance exactly which 
> characters it needs to recognize to conserve memory.
>    www.SeeScreen.com
> Therefore I do need to know exactly which characters belong 
> to each locale.

Tough luck then. As I was saying, not possible.

And, by the way, what will your screen reader do with
me Romanian emails on my English system?
And ocasionally I go read French, Italian, Spanish.
My system is XP, uses Unicode, and I take advantage of it.

Even if you find a way to know exactly which characters
are used in each language, it is kind of pointless,
because you don't know in what language is the text
currently displayed.


-- 
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
0
11/17/2007 9:20:34 AM
Hmmm, so if I have the ENU locale, and I'm reading a Web page from Greece, Russia, Israel,
or Qatar, your program will refuse to work because the characters I see are not the
characters of my locale?  Doesn't sound reasonable.  The Locale does not, and SHOULD NOT,
limit the characters I see on my screen.  Perhaps the reason I need to get recognition is
because I *don't* speak or read the language.

(And what about my friend who edits a newsletter in Hebrew on her ENU machine?  Or another
friend who communicates regularly to people in Russia, and has learned Russian so he can
be more effective in communicating to the groups he works with? Not everyone is
monolingual...I should be able to use as many fonts as I want.  It might even be
incredibly valuable to me to be able to handle fonts that are non-native to me)
				joe

On Sat, 17 Nov 2007 01:20:34 -0800, "Mihai N." <nmihai_year_2000@yahoo.com> wrote:

>> My screen reader needs to know in advance exactly which 
>> characters it needs to recognize to conserve memory.
>>    www.SeeScreen.com
>> Therefore I do need to know exactly which characters belong 
>> to each locale.
>
>Tough luck then. As I was saying, not possible.
>
>And, by the way, what will your screen reader do with
>me Romanian emails on my English system?
>And ocasionally I go read French, Italian, Spanish.
>My system is XP, uses Unicode, and I take advantage of it.
>
>Even if you find a way to know exactly which characters
>are used in each language, it is kind of pointless,
>because you don't know in what language is the text
>currently displayed.
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
0
newcomer (15972)
11/18/2007 5:48:37 AM
See below...
On Thu, 15 Nov 2007 22:28:51 -0800, "Mihai N." <nmihai_year_2000@yahoo.com> wrote:

>> How can I go about supporting all languages besides the 
>> Asian characters?
>Unicode is your only option.
>There is no single code page than can cover all the European
>languages (including Greek, Cyrillic, Baltic, Turkish, EE, CE).
>
>> Is there software that I need to download 
>> so that TextOut() will display characters beyond the first 
>> 8-Bits?
>You don't need to download anything (if you are on Win 2000 or
>newer), all the support you need is there by default.
****
Except for the font issue; some of the common fonts don't have all the characters in them.
But I got Arial Unicode MS free from MS, so it isn't "software" you have to download, it
is merely "data".  If you have Office 2003 installed, you already have this font.
****
>
>> How can I know exactly which Unicode characters are 
>> required for each Country? 
>If you are Unicode, you don't have to knew.
>In fact, the characters used in a language are not 100% defined.
>Is e with accent required in English? 
>First thought is "no", but think about "resume"
>(http://en.wikipedia.org/wiki/R%C3%A9sum%C3%A9)
>
>But thing is, you don't need to know. Does it matter if the
>text is French or German when you put it on screen?
>Does it matter if a French text contains a sharp s (typical
>for German)? Or if a German text has a n tilde (typical
>for Spanish). They are all present in Latin 1.
>Similarily, when you move to Unicode, you don't care
>anymore if a chertain character is only used in Turkish
>or in German, they are all covered by the code page you
>are using (Unicode).
****
In my resum�, I might want to indicate that I have been succesful in getting co�peration
from the more na�ve readers of the Encyclop�dia Britannica, and that I worked with R�lf  
�ngstrom's group in R�mskog, and Andr� Ch�teau-Gaillard in his work on la�cit� government,
not to mention my writings on the work of Anton�n Dvor�k (actuallly, if my newsreader
supported Unicode the r should have a little v over it...), the geneological research on
my nagysz�lo (except the o should have two little //s over it)...there is no limit to what
I might want to do.  

I mentioned at some point my work at the American Museum of Natural History; the goal was
to represent authors, titles, and abstracts in the native language in which they were
written, and present an English translation if possible...the point being here that while
you may gain certain efficiencies by limiting the characters whose representations are
kept in memory, it should be MY choice as to what characters are selected; if I do
bilingual work or multilingual work, I may well want to have more characters.  Read any
page in Wikipedia that deals with any country outside the United states and it is clear
that Unicode is a Way Of Life in the modern world.  So I should be able to point to a
character on the screen, and say "learn", and you would scan my font looking for a match,
even if you have to bring the characters in one at a time, compare, and free them up.  Or
choose a range of characters (perhaps I'm writing something on pronounciation, or reading
something on it, and it's full of IPA symbols; I would want to drop down some convenient
dropdown, have it have an entry "IPA symbols", and tell it to therefore include the IPA
glyphs.  And it will remember my desired configuration for the next time.
					joe
*****
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
0
newcomer (15972)
11/18/2007 6:15:28 AM
"Joseph M. Newcomer" <newcomer@flounder.com> wrote in 
message news:v6kvj3pnbnus2ntbgr4ov7g4l4ktggonp6@4ax.com...
> Hmmm, so if I have the ENU locale, and I'm reading a Web 
> page from Greece, Russia, Israel,
> or Qatar, your program will refuse to work because the 
> characters I see are not the
> characters of my locale?  Doesn't sound reasonable.  The 
> Locale does not, and SHOULD NOT,
> limit the characters I see on my screen.  Perhaps the 
> reason I need to get recognition is
> because I *don't* speak or read the language.
>
> (And what about my friend who edits a newsletter in Hebrew 
> on her ENU machine?  Or another
> friend who communicates regularly to people in Russia, and 
> has learned Russian so he can
> be more effective in communicating to the groups he works 
> with? Not everyone is
> monolingual...I should be able to use as many fonts as I 
> want.  It might even be
> incredibly valuable to me to be able to handle fonts that 
> are non-native to me)
> joe
>

I must limit the character set or my system becomes 
infeasible. I have decided that the interface for this 
limitation will be that the user specifies the set of UTF-16 
BMP characters that they want to recognize. I need to know 
how I would implement this interface. Can I simply provide a 
UTF-16 character to a Unicode build, and all the Windows 
character functions will know how to deal with it, or are 
there other levels of indirection involved?

> On Sat, 17 Nov 2007 01:20:34 -0800, "Mihai N." 
> <nmihai_year_2000@yahoo.com> wrote:
>
>>> My screen reader needs to know in advance exactly which
>>> characters it needs to recognize to conserve memory.
>>>    www.SeeScreen.com
>>> Therefore I do need to know exactly which characters 
>>> belong
>>> to each locale.
>>
>>Tough luck then. As I was saying, not possible.
>>
>>And, by the way, what will your screen reader do with
>>me Romanian emails on my English system?
>>And ocasionally I go read French, Italian, Spanish.
>>My system is XP, uses Unicode, and I take advantage of it.
>>
>>Even if you find a way to know exactly which characters
>>are used in each language, it is kind of pointless,
>>because you don't know in what language is the text
>>currently displayed.
> Joseph M. Newcomer [MVP]
> email: newcomer@flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm 


0
NoSpam8358 (375)
11/18/2007 2:42:40 PM
"Joseph M. Newcomer" <newcomer@flounder.com> wrote in 
message news:vikvj3lh68qh0o2ucchoq97gd6gertd01r@4ax.com...
> See below...
> On Thu, 15 Nov 2007 22:28:51 -0800, "Mihai N." 
> <nmihai_year_2000@yahoo.com> wrote:
>
>>> How can I go about supporting all languages besides the
>>> Asian characters?
>>Unicode is your only option.
>>There is no single code page than can cover all the 
>>European
>>languages (including Greek, Cyrillic, Baltic, Turkish, EE, 
>>CE).
>>
>>> Is there software that I need to download
>>> so that TextOut() will display characters beyond the 
>>> first
>>> 8-Bits?
>>You don't need to download anything (if you are on Win 
>>2000 or
>>newer), all the support you need is there by default.
> ****
> Except for the font issue; some of the common fonts don't 
> have all the characters in them.
> But I got Arial Unicode MS free from MS, so it isn't 
> "software" you have to download, it
> is merely "data".  If you have Office 2003 installed, you 
> already have this font.
> ****

Where can I get this?

>>
>>> How can I know exactly which Unicode characters are
>>> required for each Country?
>>If you are Unicode, you don't have to knew.
>>In fact, the characters used in a language are not 100% 
>>defined.
>>Is e with accent required in English?
>>First thought is "no", but think about "resume"
>>(http://en.wikipedia.org/wiki/R%C3%A9sum%C3%A9)
>>
>>But thing is, you don't need to know. Does it matter if 
>>the
>>text is French or German when you put it on screen?
>>Does it matter if a French text contains a sharp s 
>>(typical
>>for German)? Or if a German text has a n tilde (typical
>>for Spanish). They are all present in Latin 1.
>>Similarily, when you move to Unicode, you don't care
>>anymore if a chertain character is only used in Turkish
>>or in German, they are all covered by the code page you
>>are using (Unicode).
> ****
> In my resum�, I might want to indicate that I have been 
> succesful in getting co�peration
> from the more na�ve readers of the Encyclop�dia 
> Britannica, and that I worked with R�lf
> �ngstrom's group in R�mskog, and Andr� Ch�teau-Gaillard in 
> his work on la�cit� government,
> not to mention my writings on the work of Anton�n Dvor�k 
> (actuallly, if my newsreader
> supported Unicode the r should have a little v over 
> it...), the geneological research on
> my nagysz�lo (except the o should have two little //s over 
> it)...there is no limit to what
> I might want to do.
>
There is a memory limit to what my system is capable of 
recognizing. If there are very few overlapping glyphs and 
the glyphs are rendered without font smoothing this limit is 
quite large. If there are many overlapping glyphs and the 
glyphs are rendered with font smoothing then the limit is 
much smaller. Since the primary purpose of this system is to 
provide a GUI Scripting system, the only fonts that must be 
dealt with are those of the user interface, and these can be 
specified.

> I mentioned at some point my work at the American Museum 
> of Natural History; the goal was
> to represent authors, titles, and abstracts in the native 
> language in which they were
> written, and present an English translation if 
> possible...the point being here that while
> you may gain certain efficiencies by limiting the 
> characters whose representations are
> kept in memory, it should be MY choice as to what 
> characters are selected; if I do
> bilingual work or multilingual work, I may well want to 
> have more characters.  Read any
> page in Wikipedia that deals with any country outside the 
> United states and it is clear
> that Unicode is a Way Of Life in the modern world.  So I 
> should be able to point to a
> character on the screen, and say "learn", and you would 
> scan my font looking for a match,
> even if you have to bring the characters in one at a time, 
> compare, and free them up.  Or
> choose a range of characters (perhaps I'm writing 
> something on pronounciation, or reading
> something on it, and it's full of IPA symbols; I would 
> want to drop down some convenient
> dropdown, have it have an entry "IPA symbols", and tell it 
> to therefore include the IPA
> glyphs.  And it will remember my desired configuration for 
> the next time.
> joe
> *****
> Joseph M. Newcomer [MVP]
> email: newcomer@flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm 


0
NoSpam8358 (375)
11/18/2007 2:48:19 PM
See below...
On Sun, 18 Nov 2007 08:48:19 -0600, "Peter Olcott" <NoSpam@SeeScreen.com> wrote:

>
>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in 
>message news:vikvj3lh68qh0o2ucchoq97gd6gertd01r@4ax.com...
>> See below...
>> On Thu, 15 Nov 2007 22:28:51 -0800, "Mihai N." 
>> <nmihai_year_2000@yahoo.com> wrote:
>>
>>>> How can I go about supporting all languages besides the
>>>> Asian characters?
>>>Unicode is your only option.
>>>There is no single code page than can cover all the 
>>>European
>>>languages (including Greek, Cyrillic, Baltic, Turkish, EE, 
>>>CE).
>>>
>>>> Is there software that I need to download
>>>> so that TextOut() will display characters beyond the 
>>>> first
>>>> 8-Bits?
>>>You don't need to download anything (if you are on Win 
>>>2000 or
>>>newer), all the support you need is there by default.
>> ****
>> Except for the font issue; some of the common fonts don't 
>> have all the characters in them.
>> But I got Arial Unicode MS free from MS, so it isn't 
>> "software" you have to download, it
>> is merely "data".  If you have Office 2003 installed, you 
>> already have this font.
>> ****
>
>Where can I get this?
****
My recollection is that it is a free download from the MS Web site.
****
>
>>>
>>>> How can I know exactly which Unicode characters are
>>>> required for each Country?
>>>If you are Unicode, you don't have to knew.
>>>In fact, the characters used in a language are not 100% 
>>>defined.
>>>Is e with accent required in English?
>>>First thought is "no", but think about "resume"
>>>(http://en.wikipedia.org/wiki/R%C3%A9sum%C3%A9)
>>>
>>>But thing is, you don't need to know. Does it matter if 
>>>the
>>>text is French or German when you put it on screen?
>>>Does it matter if a French text contains a sharp s 
>>>(typical
>>>for German)? Or if a German text has a n tilde (typical
>>>for Spanish). They are all present in Latin 1.
>>>Similarily, when you move to Unicode, you don't care
>>>anymore if a chertain character is only used in Turkish
>>>or in German, they are all covered by the code page you
>>>are using (Unicode).
>> ****
>> In my resum�, I might want to indicate that I have been 
>> succesful in getting co�peration
>> from the more na�ve readers of the Encyclop�dia 
>> Britannica, and that I worked with R�lf
>> �ngstrom's group in R�mskog, and Andr� Ch�teau-Gaillard in 
>> his work on la�cit� government,
>> not to mention my writings on the work of Anton�n Dvor�k 
>> (actuallly, if my newsreader
>> supported Unicode the r should have a little v over 
>> it...), the geneological research on
>> my nagysz�lo (except the o should have two little //s over 
>> it)...there is no limit to what
>> I might want to do.
>>
>There is a memory limit to what my system is capable of 
>recognizing. If there are very few overlapping glyphs and 
>the glyphs are rendered without font smoothing this limit is 
>quite large. If there are many overlapping glyphs and the 
>glyphs are rendered with font smoothing then the limit is 
>much smaller. Since the primary purpose of this system is to 
>provide a GUI Scripting system, the only fonts that must be 
>dealt with are those of the user interface, and these can be 
>specified.
*****
But if I'm bilingual and got a program from my friend in Whereverstan, who is only
monolingual, I may want to just use it because I'm comfortable in both languages, but I
need the characters from his alphabet to automate it.

I think it is not unreasonable to provide an interface to the users by which they can say
which blocks of the Unicode world they want.  These blocks are specified by the Unicode
Consortium and I just downloaded a text file, which I parse in my Locale Explorer to get
the ranges.  So you have a dropdown check box control, and you just have the user check
off what they need.  If they select so many characters that you run out of memory, well,
maybe they just have to live with that and not try to do IPA, Early Chaldean, and
Hieroglyphics all in the same automated environment.  I don't see why, moving beyond the
base code pages, you couldn't just let the user tell you what is needed.  It saves a lot
of the problems (when confronted with complex decisions, push the decisions downstream to
the end user)
					joe
>
>> I mentioned at some point my work at the American Museum 
>> of Natural History; the goal was
>> to represent authors, titles, and abstracts in the native 
>> language in which they were
>> written, and present an English translation if 
>> possible...the point being here that while
>> you may gain certain efficiencies by limiting the 
>> characters whose representations are
>> kept in memory, it should be MY choice as to what 
>> characters are selected; if I do
>> bilingual work or multilingual work, I may well want to 
>> have more characters.  Read any
>> page in Wikipedia that deals with any country outside the 
>> United states and it is clear
>> that Unicode is a Way Of Life in the modern world.  So I 
>> should be able to point to a
>> character on the screen, and say "learn", and you would 
>> scan my font looking for a match,
>> even if you have to bring the characters in one at a time, 
>> compare, and free them up.  Or
>> choose a range of characters (perhaps I'm writing 
>> something on pronounciation, or reading
>> something on it, and it's full of IPA symbols; I would 
>> want to drop down some convenient
>> dropdown, have it have an entry "IPA symbols", and tell it 
>> to therefore include the IPA
>> glyphs.  And it will remember my desired configuration for 
>> the next time.
>> joe
>> *****
>> Joseph M. Newcomer [MVP]
>> email: newcomer@flounder.com
>> Web: http://www.flounder.com
>> MVP Tips: http://www.flounder.com/mvp_tips.htm 
>
Joseph M. Newcomer [MVP]
email: newcomer@flounder.com
Web: http://www.flounder.com
MVP Tips: http://www.flounder.com/mvp_tips.htm
0
newcomer (15972)
11/19/2007 1:15:03 AM
"Joseph M. Newcomer" <newcomer@flounder.com> wrote in 
message news:igo1k3dl5tbt5haj967rriijkptlr0gq4s@4ax.com...
> See below...
> On Sun, 18 Nov 2007 08:48:19 -0600, "Peter Olcott" 
> <NoSpam@SeeScreen.com> wrote:
>
>>
>>"Joseph M. Newcomer" <newcomer@flounder.com> wrote in
>>message news:vikvj3lh68qh0o2ucchoq97gd6gertd01r@4ax.com...
>>> See below...
>>> On Thu, 15 Nov 2007 22:28:51 -0800, "Mihai N."
>>> <nmihai_year_2000@yahoo.com> wrote:
>>>
>>>>> How can I go about supporting all languages besides 
>>>>> the
>>>>> Asian characters?
>>>>Unicode is your only option.
>>>>There is no single code page than can cover all the
>>>>European
>>>>languages (including Greek, Cyrillic, Baltic, Turkish, 
>>>>EE,
>>>>CE).
>>>>
>>>>> Is there software that I need to download
>>>>> so that TextOut() will display characters beyond the
>>>>> first
>>>>> 8-Bits?
>>>>You don't need to download anything (if you are on Win
>>>>2000 or
>>>>newer), all the support you need is there by default.
>>> ****
>>> Except for the font issue; some of the common fonts 
>>> don't
>>> have all the characters in them.
>>> But I got Arial Unicode MS free from MS, so it isn't
>>> "software" you have to download, it
>>> is merely "data".  If you have Office 2003 installed, 
>>> you
>>> already have this font.
>>> ****
>>
>>Where can I get this?
> ****
> My recollection is that it is a free download from the MS 
> Web site.
> ****

Not for years. It is included in MS Word 2002, which I just 
bought for $47.00, or I could have bought just the font 
itself for $99.00

>>
>>>>
>>>>> How can I know exactly which Unicode characters are
>>>>> required for each Country?
>>>>If you are Unicode, you don't have to knew.
>>>>In fact, the characters used in a language are not 100%
>>>>defined.
>>>>Is e with accent required in English?
>>>>First thought is "no", but think about "resume"
>>>>(http://en.wikipedia.org/wiki/R%C3%A9sum%C3%A9)
>>>>
>>>>But thing is, you don't need to know. Does it matter if
>>>>the
>>>>text is French or German when you put it on screen?
>>>>Does it matter if a French text contains a sharp s
>>>>(typical
>>>>for German)? Or if a German text has a n tilde (typical
>>>>for Spanish). They are all present in Latin 1.
>>>>Similarily, when you move to Unicode, you don't care
>>>>anymore if a chertain character is only used in Turkish
>>>>or in German, they are all covered by the code page you
>>>>are using (Unicode).
>>> ****
>>> In my resum�, I might want to indicate that I have been
>>> succesful in getting co�peration
>>> from the more na�ve readers of the Encyclop�dia
>>> Britannica, and that I worked with R�lf
>>> �ngstrom's group in R�mskog, and Andr� Ch�teau-Gaillard 
>>> in
>>> his work on la�cit� government,
>>> not to mention my writings on the work of Anton�n Dvor�k
>>> (actuallly, if my newsreader
>>> supported Unicode the r should have a little v over
>>> it...), the geneological research on
>>> my nagysz�lo (except the o should have two little //s 
>>> over
>>> it)...there is no limit to what
>>> I might want to do.
>>>
>>There is a memory limit to what my system is capable of
>>recognizing. If there are very few overlapping glyphs and
>>the glyphs are rendered without font smoothing this limit 
>>is
>>quite large. If there are many overlapping glyphs and the
>>glyphs are rendered with font smoothing then the limit is
>>much smaller. Since the primary purpose of this system is 
>>to
>>provide a GUI Scripting system, the only fonts that must 
>>be
>>dealt with are those of the user interface, and these can 
>>be
>>specified.
> *****
> But if I'm bilingual and got a program from my friend in 
> Whereverstan, who is only
> monolingual, I may want to just use it because I'm 
> comfortable in both languages, but I
> need the characters from his alphabet to automate it.
>
> I think it is not unreasonable to provide an interface to 
> the users by which they can say
> which blocks of the Unicode world they want.  These blocks 
> are specified by the Unicode
> Consortium and I just downloaded a text file, which I 
> parse in my Locale Explorer to get
> the ranges.  So you have a dropdown check box control, and 
> you just have the user check
> off what they need.  If they select so many characters 
> that you run out of memory, well,
> maybe they just have to live with that and not try to do 
> IPA, Early Chaldean, and
> Hieroglyphics all in the same automated environment.  I 
> don't see why, moving beyond the
> base code pages, you couldn't just let the user tell you 
> what is needed.  It saves a lot
> of the problems (when confronted with complex decisions, 
> push the decisions downstream to
> the end user)
> joe

I think that I will be able to let users select any 256 
glyphs from the Basic Multi-lingual Plane (BMP). That should 
cover most of the world. Asian glyphs will have to be 
treated as if they were icons rather than character glyphs.

>>
>>> I mentioned at some point my work at the American Museum
>>> of Natural History; the goal was
>>> to represent authors, titles, and abstracts in the 
>>> native
>>> language in which they were
>>> written, and present an English translation if
>>> possible...the point being here that while
>>> you may gain certain efficiencies by limiting the
>>> characters whose representations are
>>> kept in memory, it should be MY choice as to what
>>> characters are selected; if I do
>>> bilingual work or multilingual work, I may well want to
>>> have more characters.  Read any
>>> page in Wikipedia that deals with any country outside 
>>> the
>>> United states and it is clear
>>> that Unicode is a Way Of Life in the modern world.  So I
>>> should be able to point to a
>>> character on the screen, and say "learn", and you would
>>> scan my font looking for a match,
>>> even if you have to bring the characters in one at a 
>>> time,
>>> compare, and free them up.  Or
>>> choose a range of characters (perhaps I'm writing
>>> something on pronounciation, or reading
>>> something on it, and it's full of IPA symbols; I would
>>> want to drop down some convenient
>>> dropdown, have it have an entry "IPA symbols", and tell 
>>> it
>>> to therefore include the IPA
>>> glyphs.  And it will remember my desired configuration 
>>> for
>>> the next time.
>>> joe
>>> *****
>>> Joseph M. Newcomer [MVP]
>>> email: newcomer@flounder.com
>>> Web: http://www.flounder.com
>>> MVP Tips: http://www.flounder.com/mvp_tips.htm
>>
> Joseph M. Newcomer [MVP]
> email: newcomer@flounder.com
> Web: http://www.flounder.com
> MVP Tips: http://www.flounder.com/mvp_tips.htm 


0
NoSpam8358 (375)
11/19/2007 4:19:30 AM
> Since the primary purpose of this system is to 
> provide a GUI Scripting system, the only fonts that must be 
> dealt with are those of the user interface, and these can be 
> specified.

If all you care about is the UI font, than it means you deal with
standard controls. Custom controls ar custom exactly because
they want to look different.
You cannot read a web page. You cannot read a PDF document.
You cannot read the UI of a Flex, or WPF application, or a
more fancy VB or .NET application, where the designer decided
to use other fonts.
You probably cannot read a window covered by another window
(because in most cases it is not drawn).

And if you can only read standard controls, with standard fonts,
then why not just call GetWindowText (or use WM_GETTEXT,
LB_GETTEXT, CB_GETLBTEXT)?
That way you get 100% accurate text, Unicode, not guessing,
not performance problems, not problems with RTL, context
shaping, ligatures, and the font does not matter.

"Screen OCR" is great if you can use it to access stuff
that is not possible to access otherwise.
And exactly that is the stuff you are cutting out.
At that point I don't see any benefit in the "screen OCR"
approach.


-- 
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
0
11/19/2007 8:11:40 AM
"Mihai N." <nmihai_year_2000@yahoo.com> wrote in message 
news:Xns99ED1F57D1E4MihaiN@207.46.248.16...
>> Since the primary purpose of this system is to
>> provide a GUI Scripting system, the only fonts that must be
>> dealt with are those of the user interface, and these can be
>> specified.
>
> If all you care about is the UI font, than it means you deal with
> standard controls. Custom controls ar custom exactly because
> they want to look different.
> You cannot read a web page. You cannot read a PDF document.
> You cannot read the UI of a Flex, or WPF application, or a
> more fancy VB or .NET application, where the designer decided
> to use other fonts.
> You probably cannot read a window covered by another window
> (because in most cases it is not drawn).
>
> And if you can only read standard controls, with standard fonts,
> then why not just call GetWindowText (or use WM_GETTEXT,
> LB_GETTEXT, CB_GETLBTEXT)?
> That way you get 100% accurate text, Unicode, not guessing,
> not performance problems, not problems with RTL, context
> shaping, ligatures, and the font does not matter.
>
> "Screen OCR" is great if you can use it to access stuff
> that is not possible to access otherwise.
> And exactly that is the stuff you are cutting out.
> At that point I don't see any benefit in the "screen OCR"
> approach.
>

I think the target audience is QA.  Since the devs can give them a list of 
the fonts they used to create the screens, the thinking is the fonts are 
known and can be input into SeeScreen for recognition.

Font substitution (when a font is substituted when the requested one is not 
available) probably will make this not work, though.  Also, I agree with you 
that the usage scenarios are quite limited.  Too bad, this would be really 
useful app.

-- David 


0
dc2983 (3206)
11/19/2007 1:33:34 PM
"Mihai N." <nmihai_year_2000@yahoo.com> wrote in message 
news:Xns99ED1F57D1E4MihaiN@207.46.248.16...
>> Since the primary purpose of this system is to
>> provide a GUI Scripting system, the only fonts that must 
>> be
>> dealt with are those of the user interface, and these can 
>> be
>> specified.
>
> If all you care about is the UI font, than it means you 
> deal with
> standard controls. Custom controls ar custom exactly 
> because
> they want to look different.
> You cannot read a web page. You cannot read a PDF 
> document.
> You cannot read the UI of a Flex, or WPF application, or a
> more fancy VB or .NET application, where the designer 
> decided
> to use other fonts.
> You probably cannot read a window covered by another 
> window
> (because in most cases it is not drawn).
>
> And if you can only read standard controls, with standard 
> fonts,
> then why not just call GetWindowText (or use WM_GETTEXT,
> LB_GETTEXT, CB_GETLBTEXT)?
> That way you get 100% accurate text, Unicode, not 
> guessing,
> not performance problems, not problems with RTL, context
> shaping, ligatures, and the font does not matter.
>
> "Screen OCR" is great if you can use it to access stuff
> that is not possible to access otherwise.
> And exactly that is the stuff you are cutting out.
> At that point I don't see any benefit in the "screen OCR"
> approach.
>
>
> -- 
> Mihai Nita [Microsoft MVP, Windows - SDK]
> http://www.mihai-nita.net
> ------------------------------------------
> Replace _year_ with _ to get the real email

My system is capable of recognizing any graphical element 
constructed with pixels. It can simultaneously process many 
different FontInstances and character sets. The only 
limitation is the amount of memory required. I just figured 
out how it could easily handle many different character sets 
this morning. The more character sets that it handles the 
fewer typefaces will be available because of memory 
constraints. If a typeface has very few overlapping glyphs, 
(and font smoothing is turned off) it will be able to handle 
every non-Asian character set in the typeface for several 
different typefaces. 


0
NoSpam8358 (375)
11/19/2007 2:04:19 PM
"David Ching" <dc@remove-this.dcsoft.com> wrote in message 
news:Jfg0j.482$Dt4.194@newssvr19.news.prodigy.net...
> "Mihai N." <nmihai_year_2000@yahoo.com> wrote in message 
> news:Xns99ED1F57D1E4MihaiN@207.46.248.16...
>>> Since the primary purpose of this system is to
>>> provide a GUI Scripting system, the only fonts that must 
>>> be
>>> dealt with are those of the user interface, and these 
>>> can be
>>> specified.
>>
>> If all you care about is the UI font, than it means you 
>> deal with
>> standard controls. Custom controls ar custom exactly 
>> because
>> they want to look different.
>> You cannot read a web page. You cannot read a PDF 
>> document.
>> You cannot read the UI of a Flex, or WPF application, or 
>> a
>> more fancy VB or .NET application, where the designer 
>> decided
>> to use other fonts.
>> You probably cannot read a window covered by another 
>> window
>> (because in most cases it is not drawn).
>>
>> And if you can only read standard controls, with standard 
>> fonts,
>> then why not just call GetWindowText (or use WM_GETTEXT,
>> LB_GETTEXT, CB_GETLBTEXT)?
>> That way you get 100% accurate text, Unicode, not 
>> guessing,
>> not performance problems, not problems with RTL, context
>> shaping, ligatures, and the font does not matter.
>>
>> "Screen OCR" is great if you can use it to access stuff
>> that is not possible to access otherwise.
>> And exactly that is the stuff you are cutting out.
>> At that point I don't see any benefit in the "screen OCR"
>> approach.
>>
>
> I think the target audience is QA.  Since the devs can 
> give them a list of the fonts they used to create the 
> screens, the thinking is the fonts are known and can be 
> input into SeeScreen for recognition.
>
> Font substitution (when a font is substituted when the 
> requested one is not available) probably will make this 
> not work, though.  Also, I agree with you that the usage 
> scenarios are quite limited.  Too bad, this would be 
> really useful app.
>
> -- David
>

I just can't infinitely delay its release. It will be easy 
enough to enable many different character sets 
simultaneously so I will do that on the first release. I 
just figured out a way to do that will only grow memory 
requirements linearly with the increase in the number of 
character sets rather than exponentially with the increase 
in the number of characters. 


0
NoSpam8358 (375)
11/19/2007 2:17:39 PM
> My system is capable of recognizing any graphical element 
> constructed with pixels.
....
> If a typeface has very few overlapping glyphs, 
> (and font smoothing is turned off) it will be able to handle 
> every non-Asian character set in the typeface for several 
> different typefaces. 

Ok, maybe I am overly pesimistic.
Several big inventions where done by people that did not know
it was not possible, so they went ahead and did it :-D

So, if I don't see it working better than GetWindowText,
it does not mean it cannot be done.
Good luck, and don't take me too seriously.


-- 
Mihai Nita [Microsoft MVP, Windows - SDK]
http://www.mihai-nita.net
------------------------------------------
Replace _year_ with _ to get the real email
0
11/20/2007 2:16:38 AM
Reply:

Similar Artilces:

Which control should I use to display Unicode characters?
Hi, there, I want to show Unicode characters, such as Chinese and Arabic. Which control should I use? Could you please give me a bit of example code? Thanks a lot. Gary You do not need specific control to display Unicode strings. Normally, you would use SetWindowText function to set the control text. If you look in winuser.h file this 'function' #defined as SetWindowTextA or SetWindowTextW (depending on the type of the build). If you want to use Unicode text regardless of build type, specify SetWindowTextW explicitly. Note that you need to have appropriate fonts installed to a...

Upgrading to 1.2 01-20-04
I am about to install a version 1.0 of MS CRM and wonder how one upgrades it to 1.2. In other words, how do I obtain the upgrade software. Is it an on-line upgrade? Do I have to purchase it separately, etc? Thanks - Larry -- Larry Lentz, MCSE+Internet & W2K, MCDBA GoldMine Certified Professional Lentz Computer Services www.LentzComputer.Net Larry@LentzComputer.Net I believe it depends on what service plan you have. From what I read (we have something called the Foundation Services plan), our plan entitles us to all upgrades, including 1.2, of which was mailed directly to us. Where d...

Converting Lotus 1-2-3 charts to Excel
I am in the process of converting Lotus 1-2-3 spreadsheets which have charts in them to Excel. After starting Excel and opening the Lotus file in Excel. I do a 'save as' and give it the workbook format. The charts look almost identical. I had to change colors, change some ranges, etc. The one thing I can not figure out is the following: The original Lotus file had the years 1992 through 2003 on the Y-axis. The bar for 1992 was directly above the 1992 and indented about 1/2 inch from the vertcal x-axis. In Excel the chart has the years 1992 through 2003 on the Y-axis BUT the ...

Formatting problem with basic excel sheet
Hi, sorry if this has come up before but I have a problem with my excel sheet that I use to store/collate my data for work. Recently (past 2-3 days) the cells have started to display data in a peculiar manner. my data is entered into the cell as a mixture of text and numbers across the cell with the word wrap on some of the borders are formated, there are no formulae in the individual cells. however over the past days the formatting of the data has changed in that the text now runs verticaly down the cell e.g. from AName to A ...

unicode application
hi, Priyanka here, this is a question related to VB. i want to make the VB application unicode based. i am not able to do it right now. if anybody knows please let me know about this. ...

uploading 2 or more mailboxes on 1 user account
I was able to reconnect 1 mailbox to 1 user account, but what i' looking is the way wherein i can reconnect 2 or more mailboxes into user account because instead of giving a 1:1 user account to eac person we will be replacing it and giving them an account per area Hence they already have their own personal account we will b deleting it; How can i put all their messages from their persona account into the single account that we wil be giving to them Hi, More than one mailbox per account is not possible in Exchange 200x. You can do that with Exchange 5.x Regards, -- Menko den Ouden ...

Upgrading from Office 2003 Basic OEM to Excel 2007
I have a slightly unique situation, so I'm trying to figure out exactly what is allowed per the licenses. Let's say I have two desktop machines both running Office 2003 Basic OEM. I want to upgrade to Excel 2007 on machine one. From what I've been able to find on the Internet, I can purchase a retail Excel 2007 upgrade license and be in compliance, since I have the licensed OEM version on this machine. Let's say that later I want to remove Excel 2007 from machine one and install it on machine two. Since I purchased a retail upgrade license, and machine two has it...

Cvv + Transfer WU + Dumps + track 1/2....For Sale !!
Hello for all..... I'm a hacker + seller in Viet Nam . Now i have very shop and have very much cc . I want sell it Who need contact for me through Y!H : best.hacker_vnn I promise cc of me very good and fresh all with good frice . I sure u will happy if bussiness with me . *** i'm seller ccv..dumps..track 1/2..acc paypal..do transfer..ship..all country*** ----------- List cc i have ( Good with hight balance ) ---------- - COuntry : + us + uk + ca + Au + eu + iraland + inter + ger + FR + spain + italy + japan + Turkey + Asia + .....more...( contact me to ...

VERIFY and TRACE, how to implement for debug/release, unicode-aware, no warnings at lvl 4
Hi group and apologize because this isn't a pure MFC-question. I have to finish a raw win32 app written in C++. I really miss the macros TRACE and VERIFY, which I tend to use when writing MFC programs. I googled for "win32 verify macro" and the very first hit is for a codeproject.com project with implementations to be used in a win32 program, however. The project contains a single header file, debug.h, but it doesn't compile without warnings (using warning level 4), doesn't seem to be unicode aware and doesn't compile at all in release mode. I was just wonderin...

Unicode
I want to make an application for international use which should all languages(which unicode support). So i want to know how to change language? or in ther words, how to change my Locale( at run time). ...

Input must be last entry + 1?
I would like to set some sort of validation to ensure the number entered in a column is the last number + 1. The hitch is that the last number in that column may be several rows back, so I can't use "previous cell+1". And I tried MAX, but couldn't get it to work (which may just be my ignorance). Any help is appreciated. Ed Hi Ed, Since you did try MAX perhaps you can use this formula Select cell B3 then select the entire column Ctrl+Spacebar ( I have active words which uses that shortcut) Then Data, Validation =MAX(B$2:OFFSET(B3,-1,0))+1=B3 This may be ra...

Visual Basic Error #2
Hi, When I try to copy and paste some data from one worksheet into a ne worksheet I get a couple errors. In the first one the title bar say "Microsoft Visual Basic" and the error is "File not found". If I clic ok, I get another error: "Microsoft Excel cannot paste the data". Th information pastes alright though. This is in Excel 97 running on Windows XP. I've tried reinstallin office to no avail. The same info can be copied and pasted on anothe machine just fine -- Message posted from http://www.ExcelForum.com This is a guess. It sounds like there&#...

visual basic
How can I use the "CountIf" function in visual basic for Excel 2003 version. If this function isn't available in visual basic can I have any similar formulas that I can use in visual basic program -- Shi Gharib On Thu, 6 Dec 2007 02:45:01 -0800, Shi Gharib <shigharib@discussions.microsoft.com> wrote: >How can I use the "CountIf" function in visual basic for Excel 2003 version. >If this function isn't available in visual basic can I have any similar >formulas that I can use in visual basic program You should try an Excel group, this one is for ...

=?ISO-8859-1?Q?Dirty_shutdown_=96_private_store_will_not_mount?=
While trying to mount the private store from within system manager I receive the following error: The database files in this store are inconsistent ID no: c1041739 Exchange System Manager After shutting down the information store service I ran the eseutil utility with the /mh switch to check the header information. I did this on both the public and private database and the stream files. All 4 files said dirty shutdown. I attempted to do a restore with the eseutil utility, using the /r switch and got a 1018 (JET_errReadVerifyFailure, Checksum error on a database page). I then ran the repair...

How to convert to unicode in ADSI
Hi all, I made a vbs to add an extra e-maill address to a distribution list group. The e-mail address should be <group name>@dnv.com. As we are a Norwegian company, the group name sometimes contains Norwegian character such as "Ø". When I look into ADSI, the string with "Ø" do updates in "mail" attribute. But if I want to send e-mail to this distribution goup, error will report that "The format of the e-mail address is incorrect". I found if I manually create the e-mail address, the system will automatically convert ...

ftpd unicode
In Wince 5.0 file and folder names may contain UNICODE (e.g. cyrillic or chinese)characters. I was not able to transfer such files via FTP. I think it is a problem of ftpd. Any suggestions ? Tank you You've got full source code of FTPD, so clone it (see my blog) and change it to whatever you need, or buy a 3rd party FTP implementation for CE. Note that FTPD is shipped as "Sample" meaning it's not a full featured final program. Good luck, Michel Verhagen, eMVP Check out my blog: http://GuruCE.com/blog GuruCE Microsoft Embedded Partner http://...

Include 1-x columns in @Sum depending on date
I have 12 columns, one per month The formula for the Total Column should include only the columns to represent a year-to-date total. How would I do this? Jan Feb Mar Apr May 1 1 1 1 1 In March the total would be 3, in April the total would be 4. ...

Clipboard for Unicode and Non-Unicode
Hi all, We have a progarm write with non-unicode. I found that in some language, the character can not be pasted to CEdit correctly. For example, if our application is running on Russian XP, the string copy from IE can be pasted to CEdit properly. But if I pasted the same string into Notepad, and then copy the string from notepad to our CEdit, the string can not show corectly. Is it a coding problem? Is it possible to make CEdit to accept the string from notepad? Thanks, Justin If it is a question of pasting a UTF16 string into a single-byte edit control, you could try it in two stages: L...

CRM 1.2 server synchronization
Dear Friends, We are a multinational company with branches scattered in different countries. Each branch has a local CRM server. We are looking forward to a solution where we would like to Schedule all our CRM servers to SYNCHRONIZE with each other (mean, we want to synchronize the CRM database). I would like to know if something like this is possible, and if yes, is there any documentation from Microsoft regarding it. hope to get help from someone. Regards BurhanM There is no documentation on this. I assume each install of CRM is using the local language? This alone would make it dif...

How do you enter Unicode characters that don't have a Alt-nnnn shortcut?
I have a need to enter a range of non-English names into a combo box on a form in an Access 2003 application. Sometimes this requires the use of accented characters that do not appear on a UK keyboard. If I refer to the Windows XP Character Map facility (charmap,.exe) then I can see that many such foreign characters have been allocated an Alt-nnnn code that is displayed at the bottom right of the Character Map acreen. This makes it easy to enter these characters. However, many other characters have not been given an Alt-nnnn keyboard shortcut. They only have a U-FFFF code, displayed at...

Integrating CRM 1.2 and GP8
If you already have GP8 and you are integrating CRM 1.2 with it, what software is required? CRM + Biztalk? I have seen references to Biztalk Partner edition - does this come with CRM? If anyone can find any documentation on this please point me in the right direction. What you actually need is the MS CRM 1.2 Great Plains Integration product (which in essence is another Web App and a Biztalk server). I think the only place where all the information is, is on CustomerSource or PartnerSource (might be wrong). I found all the documentation on it, and can send it through to you if you want....

DPM2007, HP 1/8 G2 Autoloader and compression
We have a 1/8 G2 920 Autoloader and are using Microsoft DPM2007 as back-up software. We are using 800GB Ultrium tapes (C7973A) as back-up medium. We are making a back-up of files, Exchange and SQL server bak files and have tape spanning enabled. The problem we have is that we would like to compress data to the tapes, so that we can use the full 800GB instead of 400GB, but we are unable to get data compressed. There are 2 compression option settings. Compression by DPM or compression by tape drive. We've played around enabling and disabling one or both of them, but we are u...

Basic questions
Hello, I have never seen or used Windows Server and I have a few basic questions before switching Windows XP Pro into WS 2008 R2. Will I be able to install and use other softwares normally as if I was using Windows XP? I have other servers I would like to use with it and I need to know in advance if they will be 100% compatible: Oracle XE 10g MySQL (latest version) Apache 2.2 FileZilla server Windows Media Player 12 sharing Ultra VNC So, please let me know if the above software would be OK and besides that, I would have to test some programs I develop by myself, so I ne...

Opening CSV file(saved in unicode) in excel
Hi, If anyone can help me on this, I would really appreciate that. I have a file that is saved in .csv extension (Unicode format). Let's say for example, I have the data as follows: "text1","text2",123,"text3",450.00 When I save the above data in .csv (in Ascii format) and open in excel file, I see that each value is placed in adjacent columns. When I save the same data in .csv (Unicode) I see the whole information appears in the single cell(first cell only). Is there a limitation to the CSV FILE opening in EXCEL, with Unicode format? Any inputs/...

Visual Basic.net
I currently work in Finance and have been told that VB.net is a useful tool for financial modeling which I want to learn. Any opinions on this? Thanks -- Mark "Mark" <Mark@discussions.microsoft.com> wrote in message news:14FAA95A-4D62-452F-A21D-24D2CB4F0396@microsoft.com... >I currently work in Finance and have been told that VB.net is a useful tool > for financial modeling which I want to learn. Any opinions on this? This group is for VB 6 and earlier; VB.Net is a substantially different language. You need to ask in a group with "dotnet" i...