Difference between revisions of "User:Michiexile/MATH198/Lecture 1"

From HaskellWiki
Jump to navigation Jump to search
m (→‎Categories: Typo.)
 
(33 intermediate revisions by 2 users not shown)
Line 1: Line 1:
  +
This page now includes additional information based on the notes taken in class. Hopefully this will make the notes reasonably complete for everybody.
==Welcome, administrativia==
 
  +
  +
  +
==Welcome, administratrivia==
  +
  +
I'm Mikael Vejdemo-Johansson. I can be reached in my office 383-BB, especially during my office hours; or by email to mik@math.stanford.edu.
  +
  +
I encourage, strongly, student interactions.
  +
  +
I will be out of town September 24 - 29. I will monitor forum and email closely, and recommend electronic ways of getting in touch with me during this week. I will be back again in time for the office hours on the 30th.
   
 
==Introduction==
 
==Introduction==
Why this course? What will we cover? What do we require?
+
===Why this course?===
  +
  +
An introduction to Haskell will usually come with pointers toward Category Theory as a useful tool, though not with much more than the mention of the subject. This course is intended to fill that gap, and provide an introduction to Category Theory that ties into Haskell and functional programming as a source of examples and applications.
  +
  +
===What will we cover?===
  +
  +
The definition of categories, special objects and morphisms, functors, natural transformation, (co-)limits and special cases of these, adjunctions, freeness and presentations as categorical constructs, monads and Kleisli arrows, recursion with categorical constructs.
  +
  +
Maybe, just maybe, if we have enough time, we'll finish with looking at the definition of a topos, and how this encodes logic internal to a category. Applications to fuzzy sets.
  +
  +
===What do we require?===
  +
  +
Our examples will be drawn from discrete mathematics, logic, Haskell programming and linear algebra. I expect the following concepts to be at least vaguely familiar to anyone taking this course:
  +
* Sets
  +
* Functions
  +
* Permutations
  +
* Groups
  +
* Partially ordered sets
  +
* Vector spaces
  +
* Linear maps
  +
* Matrices
  +
* Homomorphisms
  +
  +
===Good references===
  +
  +
On reserve in the mathematics/CS library are:
  +
* ''Mac Lane'': '''Categories for the working mathematician'''
  +
:This book is technical, written for a mathematical audience, and puts in more work than is strictly necessary in many of the definitions. When Awodey and Mac Lane deviate, we will give Awodey priority.
  +
* ''Awodey'': '''Category Theory'''
  +
:This book is also available as an ebook, accessible from the Stanford campus network. The coursework webpage has links to the ebook under Materials.
  +
  +
===Monoids===
  +
  +
In order to settle notation and ensure everybody's seen a definition before:
  +
  +
'''Definition''' A ''monoid'' is a set <math>M</math> equipped with a binary associative operation <math>*</math> (in Haskell: <hask>mappend</hask>) and an identity element <math>\emptyset</math> (in Haskell: <hask>mempty</hask>).
  +
  +
A ''semigroup'' is a monoid without the requirement for an identity element.
  +
  +
A function <math>f:M\to N</math> is a monoid homomorphism if the following conditions hold:
  +
* <math>f(\emptyset) = \emptyset</math>
  +
* <math>f(m*m') = f(m)*f(m')</math>
  +
  +
'''Examples'''
  +
* Any group is a monoid. Thus, specifically, the integers with addition is a monoid.
  +
* The natural numbers, with addition.
  +
* Strings <math>L^*</math> in an alphabet <math>L</math> is a monoid with string concatenation forming the operation and the empty string the identity.
  +
* Non-empty strings form a semigroup.
  +
  +
For more information, see the Wikipedia page on Monoids. [[http://en.wikipedia.org/wiki/Monoid]].
  +
  +
Awodey: p. 10.
  +
  +
===Partially ordered set===
  +
  +
'''Definition''' A ''partially ordered set'', or a ''partial order'', or a ''poset'' is a set <math>P</math> equipped with a binary relation <math>\leq</math> which is:
  +
* Reflexive: <math>x\leq x</math> for all <math>x\in P</math>
  +
* Anti-symmetric: <math>x\leq y</math> and <math>y\leq x</math> implies <math>x=y</math> for all <math>x,y\in P</math>.
  +
* Transitive: <math>x\leq y</math> and <math>y\leq z</math> implies <math>x\leq z</math> for all <math>x,y,z\in P</math>.
  +
  +
If <math>x\leq y</math> or <math>y\leq x</math>, we call <math>x,y</math> ''comparable''. Otherwise we call them ''incomparable''. A poset where all elements are mutually comparable is called a ''totally ordered set'' or a ''total order''.
  +
  +
If we drop the requirement for anti-symmetry, we get a ''pre-ordered set'' or a ''pre-order''.
  +
  +
If we have several posets, we may indicate which poset we're comparing ''in'' byindicating the poset as a subscript to the relation symbol.
  +
  +
A ''monotonic'' map of posets is a function <math>f:P\to Q</math> such that <math>x\leq_P y</math> implies <math>f(x)\leq_Q f(y)</math>.
  +
  +
'''Examples'''
  +
* The reals, natural numbers, integers all are posets with the usual comparison relation. A poset in which all elements are comparable.
  +
* The natural numbers, excluding 0, form a poset with <math>a\leq b</math> if <math>a|b</math>.
  +
* Any family of subsets of a given set form a poset with the order given by inclusion.
  +
  +
For more information, see the Wikipedia page on Partially ordered set. [[http://en.wikipedia.org/wiki/Partially_ordered_set]]
  +
  +
Awodey: p. 6. Preorders are defined on page 8-9.
   
 
==Category==
 
==Category==
A ''graph'' is a collection <math>G_0</math> of ''vertices'' and a collection <math>G_1</math> of ''arrows''. The structure of the graph is captured in the existence of two functions, that we shall call ''source'' and ''target'', both going from <math>G_1</math> to <math>G_1</math>. In other words, each arrow has a source and a target.
 
   
  +
Awodey has a slightly different exposition. Relevant pages in Awodey for this lecture are: sections 1.1-1.4 (except Def. 1.2), 1.6-1.8.
We denote by ''[v,w]'' the collection of arrows with source ''v'' and target ''w''.
 
  +
  +
===Graphs===
  +
  +
We recall the definition of a ''(directed) graph''. A graph <math>G</math> is a collection of ''edges (arrows)'' and ''vertices (nodes)''. Each edge is assigned a ''source'' node and a ''target'' node.
  +
  +
<math>source \to target</math>
  +
  +
Given a graph <math>G</math>, we denote the collection of nodes by <math>G_0</math> and the collection of arrows by <math>G_1</math>. These two collections are connected, and the graph given its structure, by two functions: the source function <math>s:G_1\to G_0</math> and the target function <math>t:G_1\to G_0</math>.
  +
  +
We shall not, in general, require either of the collections to be a set, but will happily accept larger collections; dealing with set-theoretical paradoxes as and when we have to. A graph where both nodes and arrows are sets shall be called ''small''. A graph where either is a class shall be called ''large''.
  +
0
  +
If both <math>G_0</math> and <math>G_1</math> are finite, the graph is called ''finite'' too.
  +
  +
The ''empty graph'' has <math>G_0 = G_1 = \emptyset</math>.
  +
  +
A ''discrete graph'' has <math>G_1=\emptyset</math>.
  +
  +
A ''complete graph'' has <math>G_1 = \{ (v,w) | v,w\in G_0\}</math>.
  +
  +
A ''simple graph'' has at most one arrow between each pair of nodes. Any relation on a set can be interpreted as a simple graph.
  +
  +
* Show some examples.
  +
  +
A ''homomorphism'' <math>f:G\to H</math> of graphs is a pair of functions <math>f_0:G_0\to H_0</math> and <math>f_1:G_1\to H_1</math> such that sources map to sources and targets map to targets, or in other words:
  +
* <math>s(f_1(e)) = f_0(s(e))</math>
  +
* <math>t(f_1(e)) = f_0(t(e))</math>
  +
  +
By a ''path'' in a graph <math>G</math> from the node <math>x</math> to the node <math>y</math> of length <math>k</math>, we mean a sequence of edges <math>(f_1,f_2,\dots,f_k)</math> such that:
  +
* <math>s(f_1)=x</math>
  +
* <math>t(f_k)=y</math>
  +
* <math>s(f_i) = t(f_{i-1})</math> for all other <math>i</math>.
  +
  +
Paths with start and end point identical are called ''closed''. For any node <math>x</math>, there is a unique closed path <math>()</math> starting and ending in <math>x</math> of length 0.
  +
  +
For any edge <math>f</math>, there is a unique path from <math>s(f)</math> to <math>t(f)</math> of length 1: <math>(f)</math>.
  +
  +
We denote by <math>G_k</math> the set of paths in <math>G</math> of length <math>k</math>.
  +
  +
===Categories===
  +
  +
We now are ready to define a category. A ''category'' is a graph <math>G</math> equipped with an associative composition operation <math>\circ:G_2\to G_1</math>, and an identity element for composition <math>1_x</math> for each node <math>x</math> of the graph.
  +
  +
Note that <math>G_2</math> can be viewed as a subset of <math>G_1\times G_1</math>, the set of all pairs of arrows. It is intentional that we define the composition operator on only a subset of the set of all pairs of arrows - the composable pairs. Whenever you'd want to compose two arrows that don't line up to a path, you'll get nonsense, and so any statement about the composition operator has an implicit "whenever defined" attached to it.
  +
  +
The definition is not quite done yet - this composition operator, and the identity arrows both have a few rules to fulfill, and before I state these rules, there are some notation we need to cover.
  +
  +
====Backwards!====
  +
  +
If we have a path given by the arrows <math>(f,g)</math> in <math>G_2</math>, we expect <math>f:A\to B</math> and <math>g:B\to C</math> to compose to something that goes <math>A\to C</math>. The origin of all these ideas lies in geometry and algebra, and so the abstract arrows in a category are ''supposed'' to behave like functions under function composition, even though we don't say it explicitly.
  +
  +
Now, we are used to writing function application as <math>f(x)</math> - and possibly, from Haskell, as <hask>f x</hask>. This way, the composition of two functions would read <math>g(f(x))</math>.
  +
  +
On the other hand, the way we write our paths, we'd read <math>f</math> then <math>g</math>. This juxtaposition makes one of the two ways we write things seem backwards. We can resolve it either by making our paths in the category go backwards, or by reversing how we write function application.
  +
  +
In the latter case, we'd write <math>x.f</math>, say, for the application of <math>f</math> to <math>x</math>, and then write <math>x.f.g</math> for the composition. It all ends up looking a lot like Reverse Polish Notation, and has its strengths, but feels unnatural to most. It does, however, have the benefit that we can write out function composition as <math>(f,g) \mapsto f.g</math> and have everything still make sense in all notations.
  +
  +
In the former case, which is the most common in the field, we accept that paths as we read along the arrows and compositions look backwards, and so, if <math>f:A\to B</math> and <math>g:B\to C</math>, we write <math>g\circ f:A\to C</math>, remembering that ''elements'' are introduced from the right, and the functions have to consume the elements in the right order.
  +
  +
----
  +
  +
The existence of the identity map can be captured in a function language as well: it is the existence of a function <math>u:G_0\to G_1</math>.
  +
  +
Now for the remaining rules for composition. Whenever defined, we expect associativity - so that <math>h\circ(g\circ f)=(h\circ g)\circ f</math>. Furthermore, we expect:
  +
# Composition respects sources and targets, so that:
  +
#* <math>s(g\circ f) = s(f)</math>
  +
#* <math>t(g\circ f) = t(g)</math>
  +
# <math>s(u(x)) = t(u(x)) = x</math>
  +
  +
In a category, arrows are also called ''morphisms'', and nodes are also called ''objects''. This ties in with the algebraic roots of the field.
  +
  +
We denote by <math>Hom_C(A,B)</math>, or if <math>C</math> is obvious from context, just <math>Hom(A,B)</math>, the set of all arrows from <math>A</math> to <math>B</math>. This is the ''hom-set'' or ''set of morphisms'', and may also be denoted <math>C(A,B)</math>.
  +
  +
If a category is large or small or finite as a graph, it is called a large/small/finite category.
  +
  +
A category with objects a collection of sets and morphisms a selection from all possible set-valued functions such that the identity morphism for each object is a morphism, and composition in the category is just composition of functions is called ''concrete''. Concrete categories form a very rich source of examples, though far from all categories are concrete.
  +
  +
Again, the Wikipedia page on Category (mathematics) [[http://en.wikipedia.org/wiki/Category_%28mathematics%29]] is a good starting point for many things we will be looking at throughout this course.
  +
  +
===New Categories from old===
  +
  +
As with most other algebraic objects, one essential part of our tool box is to take known objects and form new examples from them. This allows us generate a wealth of examples from the ones that shape our intuition.
  +
  +
Typical things to do here would be to talk about ''subobjects'', ''products'' and ''coproducts'', sometimes obvious ''variations on the structure'', and what a ''typical object'' looks like. Remember from linear algebra how ''subspaces'', ''cartesian products'' (which for finite-dimensional vectorspaces covers both products and coproducts) and ''dual spaces'' show up early, as well as the theorems giving ''dimension'' as a complete descriptor of a vectorspace.
  +
  +
We'll go through the same sequence here; with some significant small variations.
  +
  +
A category <math>D</math> is a ''subcategory'' of the category <math>C</math> if:
  +
* <math>D_0\subseteq C_0</math>
  +
* <math>D_1\subseteq C_1</math>
  +
* <math>D_1</math> contains <math>1_X</math> for all <math>X\in D_0</math>
  +
* sources and targets of all the arrows in <math>D_1</math> are all in <math>D_0</math>
  +
* the composition in <math>D</math> is the restriction of the composition in <math>C</math>.
  +
  +
Written this way, it does look somewhat obnoxious. It does become easier though, with the realization - studied closer in homework exercise 2 - that the really important part of a category is the collection of arrows. Thus, a subcategory is a subcollection of the collection of arrows - with identities for all objects present, and with at least all objects that the existing arrows imply.
  +
  +
A subcategory <math>D\subseteq C</math> is ''full'' if <math>D(A,B)=C(A,B)</math> for all objects <math>A,B</math> of <math>D</math>. In other words, a full subcategory is completely determined by the selection of objects in the subcategory.
  +
  +
A subcategory <math>D\subseteq C</math> is ''wide'' if the collection of objects is the same in both categories. Hence, a wide subcategory picks out a subcollection of the morphisms.
  +
  +
The ''dual'' of a category is to a large extent inspired by vector space duals. In the dual <math>C^*</math> of a category <math>C</math>, we have the same objects, and the morphisms are given by the equality <math>C^*(A,B)=C(B,A)</math> - every morphism from <math>C</math> is present, but it goes in the ''wrong'' direction. Dualizing has a tendency to add the prefix ''co-'' when it happens, so for instance coproducts are the dual notion to products. We'll return to this construction many times in the course.
  +
  +
Given two categories <math>C,D</math>, we can combine them in several ways:
  +
# We can form the category that has as objects the disjoint union of all the objects of <math>C</math> and <math>D</math>, and that sets <math>Hom(A,B)=\emptyset</math> whenever <math>A,B</math> come from different original categories. If <math>A,B</math> come from the same original category, we simply take over the homset from that category. This yields a categorical ''coproduct'', and we denote the result by <math>C+D</math>. Composition is inherited from the original categories.
  +
# We can also form the category with objects <math>\langle A,B\rangle</math> for every pair of objects <math>A\in C, B\in D</math>. A morphism in <math>Hom(\langle A,B\rangle,\langle A',B'\rangle)</math> is simply a pair <math>\langle f:A\to A',g:B\to B'\rangle</math>. Composition is defined componentwise. This category is the categorical correspondent to the cartesian ''product'', and we denot it by <math>C\times D</math>.
  +
  +
In these three constructions - the dual, the product and the coproduct - the arrows in the categories are formal constructions, not functions; even if the original category was given by functions, the result is no longer given by a function.
  +
  +
Given a category <math>C</math> and an object <math>A</math> of that category, we can form the ''slice category'' <math>C/A</math>. Objects in the slice category are arrows <math>B\to A</math> for some object <math>B</math> in <math>C</math>, and an arrow <math>\phi:f\to g</math> is an arrow <math>s(f)\to s(g)</math> such that <math>f=g\circ\phi</math>. Composites of arrows are just the composites in the base category.
   
  +
Notice that the same arrow <math>\phi</math> in the base category <math>C</math> represents potentially many different arrows in <math>C/A</math>: it represents one arrow for each choice of source and target compatible with it.
A ''category'' is a graph with some special structure:
 
   
  +
There is a dual notion: the ''coslice category'' <math>A\backslash C</math>, where the objects are paired with maps <math>A\to B</math>.
* Each ''[v,w]'' is a set and equipped with a composition operation <math>[u,v] \times [v,w] \to [u,w]</math>. In other words, any two arrows, such that the target of one is the source of the other, can be composed to give a new arrow with target and source from the ones left out.
 
   
  +
Slice categories can be used, among other things, to specify the idea of parametrization. The slice category <math>C/A</math> gives a sense to the idea of ''objects from <math>C</math> labeled by elements of <math>A</math>''.
We write <math>f:u\to v</math> if <math>f\in[u,v]</math>.
 
   
  +
We get this characterization by interpreting the arrow representing an object as representing its source and a ''type function''. Hence, in a way, the <hask>Typeable</hask> type class in Haskell builds a slice category on an appropriate subcategory of the category of datatypes.
<math>u \to v \to w</math> => <math>u \to w</math>
 
   
  +
Alternatively, we can phrase the importance of the arrow in a slice categories of, say, Set, by looking at preimages of the slice functions. That way, an object <math>f:B\to A</math> gives us a family of (disjoint) subsets of <math>B</math> ''indexed'' by the elements of <math>A</math>.
* The composition of arrows is associative.
 
* Each vertex ''v'' has a dedicated arrow <math>1_v</math> with source and target ''v'', called the identity arrow.
 
* Each identity arrow is a left- and right-identity for the composition operation.
 
   
  +
Finally, any graph yields a category by just filling in the arrows that are missing. The result is called the ''free category generated by the graph'', and is a concept we will return to in some depth. Free objects have a strict categorical definition, and they serve to give a model of thought for the things they are free objects for. Thus, categories are essentially graphs, possibly with restrictions or relations imposed; and monoids are essentially strings in some alphabet, with restrictions or relations.
The composition of <math>f:u\to v</math> with <math>g:v\to w</math> is denoted by <math>gf:u\to v\to w</math>. A mnemonic here is that you write things so associativity looks right. Hence, ''(gf)(x) = g(f(x))''. This will make more sense once we get around to ''generalized elements'' later on.
 
   
 
===Examples===
 
===Examples===
   
* The empty category with no vertices and no arrows.
+
* The empty category.
  +
** No objects, no morphisms.
* The category ''1'' with a single vertex and only its identity arrow.
 
* The category ''2'' with two objects, their identity arrows and the arrow <math>a\to b</math>.
+
* The one object/one arrow category <math>1</math>.
  +
** A single object and its identity arrow.
* For vertices take vector spaces. For arrows, take linear maps. This is a category, the identity arrow is just the identity map <math>f(x) = x</math> and composition is just function composition.
 
  +
* The categories <math>2</math> and <math>1+1</math>.
* For vertices take finite sets. For arrows, take functions.
 
  +
** Two objects, <math>A,B</math> with identity arrows and a unique arrow <math>A\to B</math>.
* For vertices take logical propositions. For arrows take proofs in propositional logic. The identity arrow is the empty proof: ''P'' proves ''P'' without an actual proof. And if you can prove ''P'' using ''Q'' and then ''R'' using ''P'', then this composes to a proof of ''R'' using ''Q''.
 
  +
* The category Set of sets.
* For vertices, take data types. For arrows take (computable) functions. This forms a category, in which we can discuss an abstraction that mirrors most of Haskell. There are issues making Haskell not quite a category on its own, but we get close enough to draw helpful conclusions and analogies.
 
  +
** Sets for objects, functions for arrows.
* Suppose ''P'' is a set equipped with a partial ordering relation ''<''. Then we can form a category out of this set with elements for vertices and with a single element in ''[v,w]'' if and only if ''v<w''. Then the transitivity and reflexivity of partial orderings show that this forms a category.
 
  +
* The catgeory FSet of finite sets.
  +
** Finite sets for objects, functions for arrows.
  +
* The category PFn of sets and partial functions.
  +
** Sets for objects. Arrows are pairs <math>(S'\subseteq S,f:S'\to T)\in PFn(S,T)</math>.
  +
** <math>PFn(A,B)</math> is a partially ordered set. <math>(S_f,f)\leq(S_g,g)</math> precisely if <math>S_f\subseteq S_g</math> and <math>f=g|_{S_f}</math>.
  +
** The exposition at Wikipedia uses the construction here: [[http://en.wikipedia.org/wiki/Partial_function]].
  +
* There is an alternative way to define a category of partial functions: For objects, we take sets, and for morphisms <math>S\to T</math>, we take subsets <math>F\subseteq S\times T</math> such that each element in <math>S</math> occurs in at most one pair in the subset. Composition is by an interpretation of these subsets corresponding to the previous description. We'll call this category <math>PFn'</math>.
  +
* Every partial order is a category. Each hom-set has at most one element.
  +
** Objects are the elements of the poset. Arrows are unique, with <math>A\to B</math> precisely if <math>A\leq B</math>.
  +
* Every monoid is a category. Only one object. The elements of the monoid correspond to the endo-arrows of the one object.
  +
* The category of Sets and injective functions.
  +
** Objects are sets. Morphisms are injective functions between the sets.
  +
* The category of Sets and surjective functions.
  +
** Objects are sets. Morphisms are surjective functions between the sets.
  +
* The category of <math>k</math>-vector spaces and linear maps.
  +
* The category with objects the natural numbers and <math>Hom(m,n)</math> the set of <math>m\times n</math>-matrices.
  +
** Composition is given by matrix multiplication.
  +
* The category of Data Types with Computable Functions.
  +
** Our ideal programming language has:
  +
*** Primitive data types.
  +
*** Constants of each primitive type.
  +
*** Operations, given as functions between types.
  +
*** Constructors, producing elements from data types, and producing derived data types and operations.
  +
** We will assume that the language is equipped with
  +
*** A do-nothing operation for each data type. Haskell has <hask>id</hask>.
  +
*** An empty type <math>1</math>, with the property that each type has exactly one function to this type. Haskell has <hask>()</hask>. We will use this to define the constants of type <math>t</math> as functions <math>1\to t</math>. Thus, constants end up being 0-ary functions.
  +
*** A composition constructor, taking an operator <math>f:A\to B</math> and another operator <math>g:B\to C</math> and producing an operator <math>g\circ f:A\to C</math>. Haskell has <hask>(.)</hask>.
  +
** This allows us to model a functional programming language with a category.
  +
* The category with objects logical propositions and arrows proofs.
  +
* The category Rel has objects finite sets and morphisms <math>A\to B</math> being subsets of <math>A\times B</math>. Composition is by <math>(a,c)\in g\circ f</math> if there is some <math>b\in B</math> such that <math>(a,b)\in f, (b,c)\in g</math>. Identity morphism is the diagonal <math>(a,a): a\in A</math>.
   
Some language we want settled:
 
   
A category is ''concrete'' if it is like the vector spaces and the sets among the examples - the collection of all sets-with-specific-additional-structure equipped with all functions-respecting-that-structure. We require already that ''[v,w]'' is always a set.
 
   
A category is ''small'' if the collection of all vertices, too, is a set.
 
   
==Morphisms==
 
The arrows of a category are called ''morphisms''. This is derived from ''homomorphisms''.
 
   
  +
===Homework===
Some arrows have special properties that make them extra helpful; and we'll name them:
 
   
  +
For a passing mark, a written, acceptable solution to at least 3 of the 6 questions should be given no later than midnight before the next lecture.
;Endomorphism:A morphism with the same object as source and target.
 
;Monomorphism:A morphism that is left-cancellable. Corresponds to injective functions. We say that ''f'' is a monomorphism if for any <math>g_1,g_2</math>, the equation <math>fg_1 = fg_2</math> implies <math>g_1=g_2</math>. In other words, with a concrete perspective, ''f'' doesn't introduce additional relations when applied.
 
;Epimorphism:A morphism that is right-cancellable. Corresponds to surjective functions. We say that ''f'' is an epimorphism if for any <math>g_1,g_2</math>, the equation <math>g_1f = g_2f</math> implies <math>g_1=g_2</math>.
 
Note, by the way, that cancellability does not imply the existence of an inverse. Epi's and mono's that have inverses realizing their cancellability are called ''split''.
 
;Isomorphism;A morphism is an isomorphism if it has an inverse.
 
   
  +
For each lecture, there will be a few exercises marked with the symbol *. These will be more difficult than the other exercises given, will require significant time and independent study, and will aim to complement the course with material not covered in lectures, but nevertheless interesting for the general philosophy of the lecture course.
==Objects==
 
In a category, we use a different name for the vertices: ''objects''. This comes from the roots in describing concrete categories - thus while objects may be actual mathematical objects, but they may just as well be completely different.
 
   
  +
# Prove the general associative law: that for any path, and any bracketing of that path, the same composition results.
Some objects, if they exist, give us strong
 
  +
# Which of the following form categories? Proof and disproof for each:
  +
#* Objects are finite sets, morphisms are functions such that <math>|f^{-1}(b)|\leq 2</math> for all morphisms f, objects B and elements b.
  +
#* Objects are finite sets, morphisms are functions such that <math>|f^{-1}(b)|\geq 2</math> for all morphisms f, objects B and elements b.
  +
#* Objects are finite sets, morphisms are functions such that <math>|f^{-1}(b)|<\infty</math> for all morphisms f, objects B and elements b.
  +
:Recall that <math>f^{-1}(b)=\{a\in A: f(a)=b\}</math>.
  +
# Suppose <math>u:A\to A</math> in some category <math>C</math>.
  +
## If <math>g\circ u=g</math> for all <math>g:A\to B</math> in the category, then <math>u=1_A</math>.
  +
## If <math>u\circ h=h</math> for all <math>h:B\to A</math> in the category, then <math>u=1_A</math>.
  +
## These two results characterize the objects in a category by the properties of their corresponding identity arrows completely. Specifically, there is a way to rephrase the definition of a category such that everything is stated in terms of arrows.
  +
# For as many of the examples given as you can, prove that they really do form a category. Passing mark is at least 60% of the given examples.
  +
#* Which of the categories are subcategories of which other categories? Which of these are wide? Which are full?
  +
# For this question, all parts are required:
  +
## For which sets is the free monoid on that set commutative.
  +
## Prove that for any category <math>C</math>, the set <math>Hom(A,A)</math> is a monoid under composition for every object <math>A</math>.
  +
:For details on the construction of a free monoid, see the Wikipedia pages on the Free Monoid [[http://en.wikipedia.org/wiki/Free_monoid]] and on the Kleene star [[http://en.wikipedia.org/wiki/Kleene_star]].
  +
# * Read up on <math>\omega</math>-complete partial orders. Suppose <math>S</math> is some set and <math>\mathfrak P</math> is the set of partial functions <math>S\to S</math> - in other words, an element of <math>\mathfrak P</math> is some pair <math>(S_0,f:S_0\to S)</math> with <math>S_0\subseteq S</math>. We give this set a poset structure by <math>(S_0,f)\leq(S_1,g)</math> precisely if <math>S_0\subseteq S_1</math> and <math>f(s)=g(s)\forall s\in S_0</math>.
  +
#* Show that <math>\mathfrak P</math> is a strict <math>\omega</math>-CPO.
  +
#* An element <math>x</math> of <math>S</math> is a ''fixpoint'' of <math>f:S\to S</math> if <math>f(x)=x</math>. Let <math>\mathfrak N</math> be the <math>\omega</math>-CPO of partially defined functions on the natural numbers. We define a function <math>\phi:\mathfrak N\to\mathfrak N</math> by sending some <math>h:\mathbb N\to\mathbb N</math> to a function <math>k</math> defined by
  +
#*# <math>k(0) = 1</math>
  +
#*# <math>k(n)</math> is defined only if <math>h(n-1)</math> is defined, and then by <math>k(n)=n*h(n-1)</math>.
  +
:Describe <math>\phi(n\mapsto n^2)</math> and <math>\phi(n\mapsto n^3)</math>. Show that <math>\phi</math> is ''continuous''. Find a fixpoint <math>(S_0,f)</math> of <math>\phi</math> such that any other fixpoint of the same function is larger than this one.
  +
:Find a continuous endofunction on some <math>\omega</math>-CPO that has the fibonacci function <math>F(0)=0, F(1)=1, F(n)=F(n-1)+F(n-2)</math> as the least fixed point.
  +
:Implement a Haskell function that finds fixed points in an <math>\omega</math>-CPO. Implement the two fixed points above as Haskell functions - using the <math>\omega</math>-CPO fixed point approach in the implementation. It may well be worth looking at <hask>Data.Map</hask> to provide a Haskell context for a partial function for this part of the task.

Latest revision as of 02:50, 22 June 2012

This page now includes additional information based on the notes taken in class. Hopefully this will make the notes reasonably complete for everybody.


Welcome, administratrivia

I'm Mikael Vejdemo-Johansson. I can be reached in my office 383-BB, especially during my office hours; or by email to mik@math.stanford.edu.

I encourage, strongly, student interactions.

I will be out of town September 24 - 29. I will monitor forum and email closely, and recommend electronic ways of getting in touch with me during this week. I will be back again in time for the office hours on the 30th.

Introduction

Why this course?

An introduction to Haskell will usually come with pointers toward Category Theory as a useful tool, though not with much more than the mention of the subject. This course is intended to fill that gap, and provide an introduction to Category Theory that ties into Haskell and functional programming as a source of examples and applications.

What will we cover?

The definition of categories, special objects and morphisms, functors, natural transformation, (co-)limits and special cases of these, adjunctions, freeness and presentations as categorical constructs, monads and Kleisli arrows, recursion with categorical constructs.

Maybe, just maybe, if we have enough time, we'll finish with looking at the definition of a topos, and how this encodes logic internal to a category. Applications to fuzzy sets.

What do we require?

Our examples will be drawn from discrete mathematics, logic, Haskell programming and linear algebra. I expect the following concepts to be at least vaguely familiar to anyone taking this course:

  • Sets
  • Functions
  • Permutations
  • Groups
  • Partially ordered sets
  • Vector spaces
  • Linear maps
  • Matrices
  • Homomorphisms

Good references

On reserve in the mathematics/CS library are:

  • Mac Lane: Categories for the working mathematician
This book is technical, written for a mathematical audience, and puts in more work than is strictly necessary in many of the definitions. When Awodey and Mac Lane deviate, we will give Awodey priority.
  • Awodey: Category Theory
This book is also available as an ebook, accessible from the Stanford campus network. The coursework webpage has links to the ebook under Materials.

Monoids

In order to settle notation and ensure everybody's seen a definition before:

Definition A monoid is a set equipped with a binary associative operation (in Haskell: mappend) and an identity element (in Haskell: mempty).

A semigroup is a monoid without the requirement for an identity element.

A function is a monoid homomorphism if the following conditions hold:

Examples

  • Any group is a monoid. Thus, specifically, the integers with addition is a monoid.
  • The natural numbers, with addition.
  • Strings in an alphabet is a monoid with string concatenation forming the operation and the empty string the identity.
  • Non-empty strings form a semigroup.

For more information, see the Wikipedia page on Monoids. [[1]].

Awodey: p. 10.

Partially ordered set

Definition A partially ordered set, or a partial order, or a poset is a set equipped with a binary relation which is:

  • Reflexive: for all
  • Anti-symmetric: and implies for all .
  • Transitive: and implies for all .

If or , we call comparable. Otherwise we call them incomparable. A poset where all elements are mutually comparable is called a totally ordered set or a total order.

If we drop the requirement for anti-symmetry, we get a pre-ordered set or a pre-order.

If we have several posets, we may indicate which poset we're comparing in byindicating the poset as a subscript to the relation symbol.

A monotonic map of posets is a function such that implies .

Examples

  • The reals, natural numbers, integers all are posets with the usual comparison relation. A poset in which all elements are comparable.
  • The natural numbers, excluding 0, form a poset with if .
  • Any family of subsets of a given set form a poset with the order given by inclusion.

For more information, see the Wikipedia page on Partially ordered set. [[2]]

Awodey: p. 6. Preorders are defined on page 8-9.

Category

Awodey has a slightly different exposition. Relevant pages in Awodey for this lecture are: sections 1.1-1.4 (except Def. 1.2), 1.6-1.8.

Graphs

We recall the definition of a (directed) graph. A graph is a collection of edges (arrows) and vertices (nodes). Each edge is assigned a source node and a target node.

Given a graph , we denote the collection of nodes by and the collection of arrows by . These two collections are connected, and the graph given its structure, by two functions: the source function and the target function .

We shall not, in general, require either of the collections to be a set, but will happily accept larger collections; dealing with set-theoretical paradoxes as and when we have to. A graph where both nodes and arrows are sets shall be called small. A graph where either is a class shall be called large. 0 If both and are finite, the graph is called finite too.

The empty graph has .

A discrete graph has .

A complete graph has .

A simple graph has at most one arrow between each pair of nodes. Any relation on a set can be interpreted as a simple graph.

  • Show some examples.

A homomorphism of graphs is a pair of functions and such that sources map to sources and targets map to targets, or in other words:

By a path in a graph from the node to the node of length , we mean a sequence of edges such that:

  • for all other .

Paths with start and end point identical are called closed. For any node , there is a unique closed path starting and ending in of length 0.

For any edge , there is a unique path from to of length 1: .

We denote by the set of paths in of length .

Categories

We now are ready to define a category. A category is a graph equipped with an associative composition operation , and an identity element for composition for each node of the graph.

Note that can be viewed as a subset of , the set of all pairs of arrows. It is intentional that we define the composition operator on only a subset of the set of all pairs of arrows - the composable pairs. Whenever you'd want to compose two arrows that don't line up to a path, you'll get nonsense, and so any statement about the composition operator has an implicit "whenever defined" attached to it.

The definition is not quite done yet - this composition operator, and the identity arrows both have a few rules to fulfill, and before I state these rules, there are some notation we need to cover.

Backwards!

If we have a path given by the arrows in , we expect and to compose to something that goes . The origin of all these ideas lies in geometry and algebra, and so the abstract arrows in a category are supposed to behave like functions under function composition, even though we don't say it explicitly.

Now, we are used to writing function application as - and possibly, from Haskell, as f x. This way, the composition of two functions would read .

On the other hand, the way we write our paths, we'd read then . This juxtaposition makes one of the two ways we write things seem backwards. We can resolve it either by making our paths in the category go backwards, or by reversing how we write function application.

In the latter case, we'd write , say, for the application of to , and then write for the composition. It all ends up looking a lot like Reverse Polish Notation, and has its strengths, but feels unnatural to most. It does, however, have the benefit that we can write out function composition as and have everything still make sense in all notations.

In the former case, which is the most common in the field, we accept that paths as we read along the arrows and compositions look backwards, and so, if and , we write , remembering that elements are introduced from the right, and the functions have to consume the elements in the right order.


The existence of the identity map can be captured in a function language as well: it is the existence of a function .

Now for the remaining rules for composition. Whenever defined, we expect associativity - so that . Furthermore, we expect:

  1. Composition respects sources and targets, so that:

In a category, arrows are also called morphisms, and nodes are also called objects. This ties in with the algebraic roots of the field.

We denote by , or if is obvious from context, just , the set of all arrows from to . This is the hom-set or set of morphisms, and may also be denoted .

If a category is large or small or finite as a graph, it is called a large/small/finite category.

A category with objects a collection of sets and morphisms a selection from all possible set-valued functions such that the identity morphism for each object is a morphism, and composition in the category is just composition of functions is called concrete. Concrete categories form a very rich source of examples, though far from all categories are concrete.

Again, the Wikipedia page on Category (mathematics) [[3]] is a good starting point for many things we will be looking at throughout this course.

New Categories from old

As with most other algebraic objects, one essential part of our tool box is to take known objects and form new examples from them. This allows us generate a wealth of examples from the ones that shape our intuition.

Typical things to do here would be to talk about subobjects, products and coproducts, sometimes obvious variations on the structure, and what a typical object looks like. Remember from linear algebra how subspaces, cartesian products (which for finite-dimensional vectorspaces covers both products and coproducts) and dual spaces show up early, as well as the theorems giving dimension as a complete descriptor of a vectorspace.

We'll go through the same sequence here; with some significant small variations.

A category is a subcategory of the category if:

  • contains for all
  • sources and targets of all the arrows in are all in
  • the composition in is the restriction of the composition in .

Written this way, it does look somewhat obnoxious. It does become easier though, with the realization - studied closer in homework exercise 2 - that the really important part of a category is the collection of arrows. Thus, a subcategory is a subcollection of the collection of arrows - with identities for all objects present, and with at least all objects that the existing arrows imply.

A subcategory is full if for all objects of . In other words, a full subcategory is completely determined by the selection of objects in the subcategory.

A subcategory is wide if the collection of objects is the same in both categories. Hence, a wide subcategory picks out a subcollection of the morphisms.

The dual of a category is to a large extent inspired by vector space duals. In the dual of a category , we have the same objects, and the morphisms are given by the equality - every morphism from is present, but it goes in the wrong direction. Dualizing has a tendency to add the prefix co- when it happens, so for instance coproducts are the dual notion to products. We'll return to this construction many times in the course.

Given two categories , we can combine them in several ways:

  1. We can form the category that has as objects the disjoint union of all the objects of and , and that sets whenever come from different original categories. If come from the same original category, we simply take over the homset from that category. This yields a categorical coproduct, and we denote the result by . Composition is inherited from the original categories.
  2. We can also form the category with objects for every pair of objects . A morphism in is simply a pair . Composition is defined componentwise. This category is the categorical correspondent to the cartesian product, and we denot it by .

In these three constructions - the dual, the product and the coproduct - the arrows in the categories are formal constructions, not functions; even if the original category was given by functions, the result is no longer given by a function.

Given a category and an object of that category, we can form the slice category . Objects in the slice category are arrows for some object in , and an arrow is an arrow such that . Composites of arrows are just the composites in the base category.

Notice that the same arrow in the base category represents potentially many different arrows in : it represents one arrow for each choice of source and target compatible with it.

There is a dual notion: the coslice category , where the objects are paired with maps .

Slice categories can be used, among other things, to specify the idea of parametrization. The slice category gives a sense to the idea of objects from labeled by elements of .

We get this characterization by interpreting the arrow representing an object as representing its source and a type function. Hence, in a way, the Typeable type class in Haskell builds a slice category on an appropriate subcategory of the category of datatypes.

Alternatively, we can phrase the importance of the arrow in a slice categories of, say, Set, by looking at preimages of the slice functions. That way, an object gives us a family of (disjoint) subsets of indexed by the elements of .

Finally, any graph yields a category by just filling in the arrows that are missing. The result is called the free category generated by the graph, and is a concept we will return to in some depth. Free objects have a strict categorical definition, and they serve to give a model of thought for the things they are free objects for. Thus, categories are essentially graphs, possibly with restrictions or relations imposed; and monoids are essentially strings in some alphabet, with restrictions or relations.

Examples

  • The empty category.
    • No objects, no morphisms.
  • The one object/one arrow category .
    • A single object and its identity arrow.
  • The categories and .
    • Two objects, with identity arrows and a unique arrow .
  • The category Set of sets.
    • Sets for objects, functions for arrows.
  • The catgeory FSet of finite sets.
    • Finite sets for objects, functions for arrows.
  • The category PFn of sets and partial functions.
    • Sets for objects. Arrows are pairs .
    • is a partially ordered set. precisely if and .
    • The exposition at Wikipedia uses the construction here: [[4]].
  • There is an alternative way to define a category of partial functions: For objects, we take sets, and for morphisms , we take subsets such that each element in occurs in at most one pair in the subset. Composition is by an interpretation of these subsets corresponding to the previous description. We'll call this category .
  • Every partial order is a category. Each hom-set has at most one element.
    • Objects are the elements of the poset. Arrows are unique, with precisely if .
  • Every monoid is a category. Only one object. The elements of the monoid correspond to the endo-arrows of the one object.
  • The category of Sets and injective functions.
    • Objects are sets. Morphisms are injective functions between the sets.
  • The category of Sets and surjective functions.
    • Objects are sets. Morphisms are surjective functions between the sets.
  • The category of -vector spaces and linear maps.
  • The category with objects the natural numbers and the set of -matrices.
    • Composition is given by matrix multiplication.
  • The category of Data Types with Computable Functions.
    • Our ideal programming language has:
      • Primitive data types.
      • Constants of each primitive type.
      • Operations, given as functions between types.
      • Constructors, producing elements from data types, and producing derived data types and operations.
    • We will assume that the language is equipped with
      • A do-nothing operation for each data type. Haskell has id.
      • An empty type , with the property that each type has exactly one function to this type. Haskell has (). We will use this to define the constants of type as functions . Thus, constants end up being 0-ary functions.
      • A composition constructor, taking an operator and another operator and producing an operator . Haskell has (.).
    • This allows us to model a functional programming language with a category.
  • The category with objects logical propositions and arrows proofs.
  • The category Rel has objects finite sets and morphisms being subsets of . Composition is by if there is some such that . Identity morphism is the diagonal .



Homework

For a passing mark, a written, acceptable solution to at least 3 of the 6 questions should be given no later than midnight before the next lecture.

For each lecture, there will be a few exercises marked with the symbol *. These will be more difficult than the other exercises given, will require significant time and independent study, and will aim to complement the course with material not covered in lectures, but nevertheless interesting for the general philosophy of the lecture course.

  1. Prove the general associative law: that for any path, and any bracketing of that path, the same composition results.
  2. Which of the following form categories? Proof and disproof for each:
    • Objects are finite sets, morphisms are functions such that for all morphisms f, objects B and elements b.
    • Objects are finite sets, morphisms are functions such that for all morphisms f, objects B and elements b.
    • Objects are finite sets, morphisms are functions such that for all morphisms f, objects B and elements b.
Recall that .
  1. Suppose in some category .
    1. If for all in the category, then .
    2. If for all in the category, then .
    3. These two results characterize the objects in a category by the properties of their corresponding identity arrows completely. Specifically, there is a way to rephrase the definition of a category such that everything is stated in terms of arrows.
  2. For as many of the examples given as you can, prove that they really do form a category. Passing mark is at least 60% of the given examples.
    • Which of the categories are subcategories of which other categories? Which of these are wide? Which are full?
  3. For this question, all parts are required:
    1. For which sets is the free monoid on that set commutative.
    2. Prove that for any category , the set is a monoid under composition for every object .
For details on the construction of a free monoid, see the Wikipedia pages on the Free Monoid [[5]] and on the Kleene star [[6]].
  1. * Read up on -complete partial orders. Suppose is some set and is the set of partial functions - in other words, an element of is some pair with . We give this set a poset structure by precisely if and .
    • Show that is a strict -CPO.
    • An element of is a fixpoint of if . Let be the -CPO of partially defined functions on the natural numbers. We define a function by sending some to a function defined by
      1. is defined only if is defined, and then by .
Describe and . Show that is continuous. Find a fixpoint of such that any other fixpoint of the same function is larger than this one.
Find a continuous endofunction on some -CPO that has the fibonacci function as the least fixed point.
Implement a Haskell function that finds fixed points in an -CPO. Implement the two fixed points above as Haskell functions - using the -CPO fixed point approach in the implementation. It may well be worth looking at Data.Map to provide a Haskell context for a partial function for this part of the task.