<div dir="ltr">The algorithm in the new module (GHC.IO.Encoding.CodePage.API) is rather intricate, so I've commented it quite thoroughly. The changes to other modules are minimal: we simply now use a real code page encoding instead of brokenly using latin1 when GHC doesn't have the code page built in, so there isn't much of a change to document.<div>
<br></div><div style>Max<br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On 24 April 2013 08:12, Simon Peyton-Jones <span dir="ltr"><<a href="mailto:simonpj@microsoft.com" target="_blank">simonpj@microsoft.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div lang="EN-GB" link="blue" vlink="purple">
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Verdana","sans-serif";color:#1f497d">Great stuff.
<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Verdana","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Verdana","sans-serif";color:#1f497d">One thing: have you left enough documentation in the code that, when someone comes along in 3 years time, they can understand the problem and how you have dealt
with it? Lot of “Note [Blah]” stuff? Or something.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Verdana","sans-serif";color:#1f497d"><br>
Thanks<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Verdana","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Verdana","sans-serif";color:#1f497d">Simon<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Verdana","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<div style="border:none;border-left:solid blue 1.5pt;padding:0cm 0cm 0cm 4.0pt">
<div>
<div style="border:none;border-top:solid #b5c4df 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> <a href="mailto:ghc-devs-bounces@haskell.org" target="_blank">ghc-devs-bounces@haskell.org</a> [mailto:<a href="mailto:ghc-devs-bounces@haskell.org" target="_blank">ghc-devs-bounces@haskell.org</a>]
<b>On Behalf Of </b>Max Bolingbroke<br>
<b>Sent:</b> 23 April 2013 21:29<br>
<b>To:</b> <a href="mailto:ghc-devs@haskell.org" target="_blank">ghc-devs@haskell.org</a><br>
<b>Subject:</b> DBCS encoding support on Windows<u></u><u></u></span></p>
</div>
</div><div><div class="h5">
<p class="MsoNormal"><u></u> <u></u></p>
<div>
<p class="MsoNormal">Hi GHCers,<u></u><u></u></p>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">I've implemented support in GHC for extra Windows code pages on the branch "dbcs" of the base library.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">The problem this solves is that currently users of Haskell on a Windows machine running in a locale which uses a double-byte code page such as CP936 (GBK) or CP950 (Big5) cannot properly interact with the Windows console in their native
language. Unfortunately code page support is a prerequisite for getting this to work correctly because for all Microsoft's fine talk about Unicode being the future, the Windows console does not seem to support it properly - code pages are the only way to go
for console input and output.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">As the standard Windows locale encodings in many regions, these code pages are also the predominant method of encoding text files in many countries, so they are useful outside the console.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">The solution is along the lines suggested in <a href="http://hackage.haskell.org/trac/ghc/ticket/3977" target="_blank">http://hackage.haskell.org/trac/ghc/ticket/3977</a>, i.e. we create an iconv-like interface to Window's MultiByteToWideChar and WideCharToMultiByte
APIs by the judicious use of binary search. In my branch, these APIs will be used whenever we don't have a built-in native Haskell TextEncoding for the code page (we used to fall back on using latin1 for such code pages).<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Unless there are any objections I'll merge this into the base library main branch next week.<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal"><u></u> <u></u></p>
</div>
<div>
<p class="MsoNormal">Cheers,<u></u><u></u></p>
</div>
<div>
<p class="MsoNormal">Max<u></u><u></u></p>
</div>
</div>
</div></div></div>
</div>
</div>
<br>_______________________________________________<br>
ghc-devs mailing list<br>
<a href="mailto:ghc-devs@haskell.org">ghc-devs@haskell.org</a><br>
<a href="http://www.haskell.org/mailman/listinfo/ghc-devs" target="_blank">http://www.haskell.org/mailman/listinfo/ghc-devs</a><br>
<br></blockquote></div><br></div>