Difference between revisions of "Unicode-symbols"

From HaskellWiki
Jump to navigation Jump to search
m (Added darcs get)
(39 intermediate revisions by 11 users not shown)
Line 14: Line 14:
   
 
GHC offers the [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#unicode-syntax UnicodeSyntax] language extension. If you decide to use Unicode in your Haskell source then this extension can greatly improve how it looks.
 
GHC offers the [http://www.haskell.org/ghc/docs/latest/html/users_guide/syntax-extns.html#unicode-syntax UnicodeSyntax] language extension. If you decide to use Unicode in your Haskell source then this extension can greatly improve how it looks.
  +
  +
Simply put the following above a module to enable unicode syntax:
  +
{-# LANGUAGE UnicodeSyntax #-}
   
 
== base-unicode-symbols ==
 
== base-unicode-symbols ==
Line 20: Line 23:
   
 
API docs: http://hackage.haskell.org/package/base-unicode-symbols
 
API docs: http://hackage.haskell.org/package/base-unicode-symbols
  +
github: https://github.com/roelvandijk/base-unicode-symbols
  +
checkout: git clone git://github.com/roelvandijk/base-unicode-symbols.git
   
  +
==== Problematic symbols ====
darcs get http://code.haskell.org/~roelvandijk/code/base-unicode-symbols
 
 
====New symbol ideas==== (''please add your own'')
 
 
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
 
|-
 
|-
Line 31: Line 34:
 
! Name
 
! Name
 
|-
 
|-
  +
| not
| [http://haskell.org/ghc/docs/latest/html/libraries/base-4.2.0.0/Control-Monad.html#v:%3E%3E Control.Monad.>>]
 
| ≫
+
| ¬
| U+226B
+
| U+AC
  +
| NOT SIGN
| MUCH GREATER-THAN
 
  +
|-
  +
| lambda
  +
| λ
  +
| U+03BB
  +
| GREEK SMALL LETTER LAMDA
 
|}
 
|}
  +
The problem with the NOT symbol is that you would like to use it as an unary prefix operator:
  +
¬(¬x) ≡ x
  +
Unfortunately this is not valid Haskell. The following is:
  +
(¬)((¬)x) ≡ x
  +
But you can hardly call that an improvement over the simple:
  +
not (not x) ≡ x
   
  +
The problem with the LAMBDA symbol is that it is classified as an alphabetic character, so it can be used as part of a name. See the [http://hackage.haskell.org/trac/ghc/ticket/1102 discussion for GHC].
== containers-unicode-symbols ==
 
   
  +
==== New symbol ideas ====
Extra symbols for the [http://hackage.haskell.org/package/containers containers] package.
 
  +
(''please add your own'')
   
  +
I'm thinking of adding the following symbol as another alternative for (*).
API docs: http://hackage.haskell.org/package/containers-unicode-symbols
 
 
darcs get http://code.haskell.org/~roelvandijk/code/containers-unicode-symbols
 
 
==== Fixities ====
 
 
What should the fixities for the following symbols be?
 
   
 
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
 
|-
 
|-
  +
! Original
 
! Symbol
 
! Symbol
 
! Code point
 
! Code point
 
! Name
 
! Name
 
|-
 
|-
| ∪
+
| (*)
| U+222A
+
| ×
| UNION
+
| U+D7
  +
| MULTIPLICATION SIGN
|-
 
| ∩
 
| U+2229
 
| INTERSECTION
 
|-
 
| ⊆
 
| U+2286
 
| SUBSET OF OR EQUAL TO
 
|-
 
| ⊇
 
| U+2287
 
| SUPERSET OF OR EQUAL TO
 
|-
 
| ⊈
 
| U+2288
 
| NEITHER A SUBSET OF NOR EQUAL TO
 
|-
 
| ⊉
 
| U+2289
 
| NEITHER A SUPERSET OF NOR EQUAL TO
 
|-
 
| ⊂
 
| U+2282
 
| SUBSET OF
 
|-
 
| ⊃
 
| U+2283
 
| SUPERSET OF
 
|-
 
| ⊄
 
| U+2284
 
| NOT A SUBSET OF
 
|-
 
| ⊅
 
| U+2285
 
| NOT A SUPERSET OF
 
 
|}
 
|}
   
  +
2 * 3 ≡ 6
Intuitively there is a correspondence with numerical operators. Compare the following operators:
 
  +
2 ⋅ 3 ≡ 6
  +
2 × 3 ≡ 6
  +
  +
A disadvantage of this symbol is its similarity to the letter x:
  +
sqr x = x × x
   
 
{| class="wikitable" border="1"
 
{| class="wikitable" border="1"
| ∪
 
| +
 
 
|-
 
|-
  +
! Original
| ⊃
 
  +
! Symbol
| >
 
  +
! Code point
  +
! Name
 
|-
 
|-
  +
| Bool
| ⊄
 
| ≮
+
| 𝔹
  +
| U+1D539
  +
| MATHEMATICAL DOUBLE-STRUCK CAPITAL B
 
|}
 
|}
   
  +
This idea is an extension of
Likewise you would like similar associativities and precedence levels:
 
   
  +
type ℤ = Integer
<haskell>
 
x + y + z < w == ((x + y) + z) < w
 
x ∪ y ∪ w ⊂ z == ((x ∪ y) ∪ z) ⊂ w
 
</haskell>
 
   
  +
and
====New symbol ideas==== (''please add your own'')
 
   
  +
type &#x211A; = Ratio &#x2124;
  +
  +
The advantage is that it looks nice and that it is a logical extension of &#x2124;, &#x211A; and &#x211D;. The disadvantage is that there is no documented prior use of this character to denote boolean values. This could be detrimental to the readability of code.
  +
  +
Example:
  +
  +
(&#x2227;) &#x2237; &#x1D539; &#x2192; &#x1D539; &#x2192; &#x1D539;
  +
  +
== containers-unicode-symbols ==
  +
  +
Extra symbols for the [http://hackage.haskell.org/package/containers containers] package.
  +
  +
API docs: http://hackage.haskell.org/package/containers-unicode-symbols
  +
github: https://github.com/roelvandijk/containers-unicode-symbols
  +
checkout: git clone git://github.com/roelvandijk/containers-unicode-symbols.git
  +
  +
==== New symbol ideas ====
  +
(''please add your own'')
   
 
== Input methods ==
 
== Input methods ==
   
 
These symbols are all very nice but how do you type them?
 
These symbols are all very nice but how do you type them?
  +
  +
Wikipedia has a helpful article: http://en.wikipedia.org/wiki/Unicode_input
   
 
(''please add info for other editors'')
 
(''please add info for other editors'')
Line 129: Line 128:
 
'''Direct'''
 
'''Direct'''
   
Enter symbols directly: <tt>[http://www.gnu.org/software/emacs/manual/html_node/emacs/International-Chars.html C-x 8 RET]</tt>, then type either the character's name or its hexadecimal code point.
+
Enter symbols directly: <tt>[http://www.gnu.org/software/emacs/manual/html_node/emacs/International-Chars.html C-x 8 RET]</tt> (<tt>ucs-insert</tt>), then type either the character's name or its hexadecimal code point.
   
'''Custom script'''
+
'''TeX input method'''
   
  +
The TeX input method, invoked with <tt>M-x set-input-method</tt> and entering <tt>TeX</tt> allows you to enter Unicode characters by typing in TeX-like sequences. For example, typing <tt>\lambda</tt> inserts a λ.
darcs get http://code.haskell.org/~roelvandijk/code/emacs-unicode-symbols
 
   
  +
This is probably the most convenient input method for casual use.
In your .emacs file:
 
<tt>
 
(load "~/elisp/emacs-unicode-symbols/haskell-symbols")
 
(defun haskell-unicode () (local-set-key (kbd "C-M-u") 'replace-haskell-unicode))
 
(add-hook 'haskell-mode-hook 'haskell-unicode)
 
</tt>
 
   
  +
A list of available sequences may be viewed with <tt>M-x describe-input-method</tt>
With above hook installed navigate the point to right after a normal symbol like '''->''' and press '''<tt>C-M-u</tt>''' to transform it into a beautiful '''&rarr;'''.
 
   
  +
'''Custom input method'''
''Caveat'': The author is an absolute Lisp newbie. Patches to above script are more than welcome.
 
  +
  +
I wrote my own input method:
  +
  +
github: https://github.com/roelvandijk/emacs-haskell-unicode-input-method
  +
checkout: git clone git://github.com/roelvandijk/emacs-haskell-unicode-input-method.git
  +
  +
To automically load in haskell-mode put the following code in your .emacs file:
  +
(require 'haskell-unicode-input-method)
  +
(add-hook 'haskell-mode-hook
  +
(lambda () (set-input-method "haskell-unicode")))
  +
  +
Make sure the directory containing the .elisp file is in your load-path, for example:
  +
(add-to-list 'load-path "~/.elisp/emacs-haskell-unicode-input-method")
  +
  +
To manually enable use <tt>M-x set-input-method</tt> or <tt>C-x RET C-\</tt> with <tt>haskell-unicode</tt>. Note that the elisp file must be evaluated for this to work.
  +
  +
Now you can simply type <tt>-></tt> and it is immediately replaced with &rarr;. Use <tt>C-\</tt> to toggle the input method. To see a table of all key sequences use <tt>M-x describe-input-method haskell-unicode</tt>. A sequence like <= is ambiguous and can mean either <tt>&lArr;</tt> or <tt>&le;</tt>. Typing it presents you with a choice. Type 1 or 2 to select an option or keep typing to use the default option.
  +
  +
If you don't like the highlighting of partially matching tokens you can turn it off:
  +
  +
(setq input-method-highlight-flag nil)
   
 
'''Abbrev mode'''
 
'''Abbrev mode'''
Line 153: Line 168:
   
 
Use Agda's [http://wiki.portal.chalmers.se/agda/pmwiki.php?n=Docs.UnicodeInput input method].
 
Use Agda's [http://wiki.portal.chalmers.se/agda/pmwiki.php?n=Docs.UnicodeInput input method].
  +
  +
=== Vim ===
  +
  +
(''real Vim users might want to expand this section'')
  +
  +
'''Direct'''
  +
  +
* Decimal value: type '''C-Vnnn''' where 0 &le; nnn &le; 255.
  +
* Octal value: type '''C-VOnnn''' or '''C-Vonnn''' where 0 &le; nnn &le; 377.
  +
* Hex value: type '''C-VXnn''' or '''C-Vxnn''' where 0 &le; nn &le; FF.
  +
* Hex value for BMP codepoints: type '''C-Vunnnn''' where 0 &le; nnnn &le; FFFF.
  +
* Hex value for any codepoint: type '''C-VUnnnnnnnn''' where 0 &le; nnnnnnnn &le; FFFFFFFF.
  +
  +
'''Automatic Unicode Transformation'''
  +
  +
Furthermore, there is a Haskell file type plugin called [https://github.com/frerich/unicode-haskell unicode-haskell] which automatically transforms ASCII character sequences (e.g. -> and many others) to Unicode when loading Haskell source code - and the data is converted back when saving. That way, the source code remains plain ASCII on disk but uses nice Unicode characters in vim/gvim. Furthermore, the plugin will automatically replace ASCII sequences with their Unicode equivalents as you type.
  +
  +
The Vim conceal definitions in haskellmode-vim pleasantly mask most of usual symbols with the unicode equivalent but have no effect on the actual source code (in my experience, this is much faster than unicode-haskell and it's much easier to edit). While in normal mode, the concealed characters on the current line will be displayed as ASCII. In insert mode and on lines other than the current one in normal mode, Unicode characters will be displayed.
  +
  +
=== SciTE ===
  +
  +
See [[Tips_for_using_SciTE_with_Haskell]]
  +
  +
=== Sublime Text 2 ===
  +
  +
Syntax highlighting for the GHC unicode syntax is not supported in the default configuration as of version 2.0.1. However the following patch, when applied to <code>Packages/Haskell/Haskell.tmLanguage</code>, does enable this: https://gist.github.com/3744568
  +
  +
Insert the following snippet into user key bindings to conveniently type unicode operators in Haskell code: https://gist.github.com/3766192 . For example, typing "->" will automatically insert "→".
  +
  +
=== System wide ===
  +
  +
'''m17n input methods'''
  +
  +
A set of input methods has been written by Urs Holzer for the [http://www.m17n.org m17n] library. The main goal of Urs is to build input methods for mathematical characters. However, most of the symbols used in the *-unicode-symbols packages can be written using Urs's methods. More information is available at [http://www.andonyar.com/rec/2008-03/mathinput/ Input Methods for Mathematics] page. For most Linux distributions, just download a [http://www.andonyar.com/rec/2008-03/mathinput/methods.tar.gz tarball], extract *.mim files to /usr/share/m17n and enable iBus for input methods.
  +
  +
== Fonts ==
  +
  +
The following free fonts have good Unicode coverage:
  +
  +
* [http://www.gnome.org/fonts/ Bitstream Vera Font Family]
  +
* [http://dejavu-fonts.org/wiki/Main_Page DejaVu Font Family]
  +
* [http://linuxlibertine.sourceforge.net/ Linux Libertine Font]
  +
* [https://www.redhat.com/promo/fonts/ RedHat Liberation Fonts]

Revision as of 13:19, 7 May 2013


Overview

An overview of the packages that provide Unicode symbols.

Naming: A package X-unicode-symbols defines new symbols for functions and operators from the package X.

All symbols are documented with their actual definition and information regarding their Unicode code point. They should be completely interchangeable with their definitions.

Alternatives for existing operators have the same fixity. New operators will have a suitable fixity defined.

UnicodeSyntax

GHC offers the UnicodeSyntax language extension. If you decide to use Unicode in your Haskell source then this extension can greatly improve how it looks.

Simply put the following above a module to enable unicode syntax:

 {-# LANGUAGE UnicodeSyntax #-}

base-unicode-symbols

Extra symbols for the base package.

 API docs: http://hackage.haskell.org/package/base-unicode-symbols
 github: https://github.com/roelvandijk/base-unicode-symbols
 checkout: git clone git://github.com/roelvandijk/base-unicode-symbols.git

Problematic symbols

Original Symbol Code point Name
not ¬ U+AC NOT SIGN
lambda λ U+03BB GREEK SMALL LETTER LAMDA

The problem with the NOT symbol is that you would like to use it as an unary prefix operator:

 ¬(¬x) ≡ x

Unfortunately this is not valid Haskell. The following is:

 (¬)((¬)x) ≡ x

But you can hardly call that an improvement over the simple:

 not (not x) ≡ x

The problem with the LAMBDA symbol is that it is classified as an alphabetic character, so it can be used as part of a name. See the discussion for GHC.

New symbol ideas

(please add your own)

I'm thinking of adding the following symbol as another alternative for (*).

Original Symbol Code point Name
(*) × U+D7 MULTIPLICATION SIGN
 2 * 3 ≡ 6
 2 ⋅ 3 ≡ 6
 2 × 3 ≡ 6

A disadvantage of this symbol is its similarity to the letter x:

 sqr x = x × x
Original Symbol Code point Name
Bool 𝔹 U+1D539 MATHEMATICAL DOUBLE-STRUCK CAPITAL B

This idea is an extension of

 type ℤ = Integer

and

 type ℚ = Ratio ℤ

The advantage is that it looks nice and that it is a logical extension of ℤ, ℚ and ℝ. The disadvantage is that there is no documented prior use of this character to denote boolean values. This could be detrimental to the readability of code.

Example:

 (∧) ∷ 𝔹 → 𝔹 → 𝔹

containers-unicode-symbols

Extra symbols for the containers package.

 API docs: http://hackage.haskell.org/package/containers-unicode-symbols
 github: https://github.com/roelvandijk/containers-unicode-symbols
 checkout: git clone git://github.com/roelvandijk/containers-unicode-symbols.git  

New symbol ideas

(please add your own)

Input methods

These symbols are all very nice but how do you type them?

Wikipedia has a helpful article: http://en.wikipedia.org/wiki/Unicode_input

(please add info for other editors)

Emacs

Direct

Enter symbols directly: C-x 8 RET (ucs-insert), then type either the character's name or its hexadecimal code point.

TeX input method

The TeX input method, invoked with M-x set-input-method and entering TeX allows you to enter Unicode characters by typing in TeX-like sequences. For example, typing \lambda inserts a λ.

This is probably the most convenient input method for casual use.

A list of available sequences may be viewed with M-x describe-input-method

Custom input method

I wrote my own input method:

 github: https://github.com/roelvandijk/emacs-haskell-unicode-input-method
 checkout: git clone git://github.com/roelvandijk/emacs-haskell-unicode-input-method.git
 

To automically load in haskell-mode put the following code in your .emacs file:

 (require 'haskell-unicode-input-method)
 (add-hook 'haskell-mode-hook 
   (lambda () (set-input-method "haskell-unicode")))

Make sure the directory containing the .elisp file is in your load-path, for example:

 (add-to-list 'load-path "~/.elisp/emacs-haskell-unicode-input-method")

To manually enable use M-x set-input-method or C-x RET C-\ with haskell-unicode. Note that the elisp file must be evaluated for this to work.

Now you can simply type -> and it is immediately replaced with →. Use C-\ to toggle the input method. To see a table of all key sequences use M-x describe-input-method haskell-unicode. A sequence like <= is ambiguous and can mean either or . Typing it presents you with a choice. Type 1 or 2 to select an option or keep typing to use the default option.

If you don't like the highlighting of partially matching tokens you can turn it off:

 (setq input-method-highlight-flag nil)

Abbrev mode

The Abbrev mode is not suitable since it only deals with words, not operators.

Agda

Use Agda's input method.

Vim

(real Vim users might want to expand this section)

Direct

  • Decimal value: type C-Vnnn where 0 ≤ nnn ≤ 255.
  • Octal value: type C-VOnnn or C-Vonnn where 0 ≤ nnn ≤ 377.
  • Hex value: type C-VXnn or C-Vxnn where 0 ≤ nn ≤ FF.
  • Hex value for BMP codepoints: type C-Vunnnn where 0 ≤ nnnn ≤ FFFF.
  • Hex value for any codepoint: type C-VUnnnnnnnn where 0 ≤ nnnnnnnn ≤ FFFFFFFF.

Automatic Unicode Transformation

Furthermore, there is a Haskell file type plugin called unicode-haskell which automatically transforms ASCII character sequences (e.g. -> and many others) to Unicode when loading Haskell source code - and the data is converted back when saving. That way, the source code remains plain ASCII on disk but uses nice Unicode characters in vim/gvim. Furthermore, the plugin will automatically replace ASCII sequences with their Unicode equivalents as you type.

The Vim conceal definitions in haskellmode-vim pleasantly mask most of usual symbols with the unicode equivalent but have no effect on the actual source code (in my experience, this is much faster than unicode-haskell and it's much easier to edit). While in normal mode, the concealed characters on the current line will be displayed as ASCII. In insert mode and on lines other than the current one in normal mode, Unicode characters will be displayed.

SciTE

See Tips_for_using_SciTE_with_Haskell

Sublime Text 2

Syntax highlighting for the GHC unicode syntax is not supported in the default configuration as of version 2.0.1. However the following patch, when applied to Packages/Haskell/Haskell.tmLanguage, does enable this: https://gist.github.com/3744568

Insert the following snippet into user key bindings to conveniently type unicode operators in Haskell code: https://gist.github.com/3766192 . For example, typing "->" will automatically insert "→".

System wide

m17n input methods

A set of input methods has been written by Urs Holzer for the m17n library. The main goal of Urs is to build input methods for mathematical characters. However, most of the symbols used in the *-unicode-symbols packages can be written using Urs's methods. More information is available at Input Methods for Mathematics page. For most Linux distributions, just download a tarball, extract *.mim files to /usr/share/m17n and enable iBus for input methods.

Fonts

The following free fonts have good Unicode coverage: