converting capital letters into small letters

that jefu guy jefu.jefu@verizon.net
26 Jul 2002 02:58:25 -0700


On Thu, 2002-07-25 at 19:07, Andrew J Bromage wrote:
> G'day all.
> 
> On Fri, Jul 26, 2002 at 01:27:48AM +0000, Karen Y wrote:
> 
> > 1. How would I convert capital letters into small letters?
> > 2. How would I remove vowels from a string?
> 
> As you've probably found out, these are very hard problems.

> Glossing over that concern, current implementations don't support the
> relevant UnicodePrims fully, so to do it properly you'll probably need
> to parse the case folding files yourself.  See:
> 
> 	http://www.unicode.org/unicode/reports/tr21/
> 
> Vowels are even harder because I don't think the Unicode standard even
> defines what a "vowel" is.  Removing vowel _marks_ should be
> straightforward once you expand combining characters, but that doesn't
> help with the general case.  Frankly, I don't like your chances.

Shouldn't the solution also take care of languages without upper casing?
Clearly the translation problem is easy enough with such languages (
"id" will work just fine), but determining (from context?) that the
string is in such a language is more than a bit difficult (especially
given that numeric codes can correspond to most everything).  

Vowels are much more difficult - even  given that the language is
recognizable, what would happen with languages such as Chinese or Arabic
which (I believe) have nothing that even resembles a vowel? 

Of course, Chinese is a whole problem by itself. 

--
jeff putnam -- jefu.jefu@verizon.net -- http://home1.get.net/res0tm0p