{"id":186,"date":"2012-07-20T00:41:36","date_gmt":"2012-07-19T22:41:36","guid":{"rendered":"http:\/\/pit-claudel.fr\/clement\/blog\/?p=186"},"modified":"2013-11-16T10:58:02","modified_gmt":"2013-11-16T09:58:02","slug":"using-abstract-classes-to-simulate-tagged-unions-aka-sum-types","status":"publish","type":"post","link":"https:\/\/pit-claudel.fr\/clement\/blog\/using-abstract-classes-to-simulate-tagged-unions-aka-sum-types\/","title":{"rendered":"Using abstract classes to simulate tagged unions (aka sum types)"},"content":{"rendered":"<p>Most functional languages offer support for tagged unions (also called sum types), a type of data structure capable of successively holding values of several fixed types. This article shows how to use abstract classes to emulate such behaviour in high-level object-oriented languages such as C#, Java, or VB.Net ((.Net languages have the <code>[StructLayout(LayoutKind.Explicit)]<\/code> attribute, which makes it possible to create structs which behave a lot like C++ unions. But that only works with primitive types.)).<\/p>\n<p><!--more--><\/p>\n<p><span class=\"fast-forward\">This article features a rather long introduction to sum types in functional languages. If you already know about this, you can <a href=\"#abstract-classes\">skip to the core of this article<\/a>.<\/span><\/p>\n<h2>An introduction to sum types and pattern matching<\/h2>\n<h3>A preliminary example: linked lists<\/h3>\n<p>Starting with a simple example, imagine you want to create a linked list structure. Most C#\/Java programmers will write something like<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\n  class MyList&lt;T&gt; {\r\n    T value;\r\n    MyList&lt;T&gt; next;\r\n\r\n    MyList(T value, MyList&lt;T&gt; next) {\r\n      this.value = value;\r\n      this.next = next;\r\n    }\r\n  }\r\n\r\n  MyList&lt;int&gt; example = new MyList(1, new MyList(2, null));\r\n<\/pre>\n<p>This relies on a rather ugly trick: <code>null<\/code> is used to represent the empty list. Functional languages prefer to clearly separate two cases : the empty list (usually called <code>nil<\/code>), and the concatenation of a value (called the head of the list) and another list (called the tail).<\/p>\n<pre class=\"brush: fsharp; title: Ocaml\/F#; notranslate\" title=\"Ocaml\/F#\">\r\n  (* 't is a type variable, just like T above *) \r\n  type 't MyList = Nil | Cons of 't * ('t MyList);; \r\n  let example = Cons(1, Cons(2, Nil))\r\n<\/pre>\n<pre class=\"brush: haskell; title: Haskell; notranslate\" title=\"Haskell\">\r\n  {- t is a type variable, just like T above -}\r\n  data MyList t = Nil | Cons t (MyList t)\r\n  example = Cons 1 (Cons 2 Nil)\r\n<\/pre>\n<p>This introduces the concept of a sum type; a type in which values belong to one of a number of sets (here, either to the singleton set <code>{Nil}<\/code>, or to the set of all values of type <code>t<\/code> &#8212; for example integers if <code>t<\/code> is <code>int<\/code>).<\/p>\n<p>Here is another example, where a value of type <code>Value<\/code> can be either an integer, or a floating-point number, or a string, or a boolean.<\/p>\n<div class=\"left\">\n<pre class=\"brush: fsharp; title: Ocaml\/F#; notranslate\" title=\"Ocaml\/F#\">\r\n  type Value = \r\n    | IntegerValue of int \r\n    | FloatValue of float\r\n    | StringValue of string\r\n    | BooleanValue of bool\r\n<\/pre>\n<\/div>\n<div class=\"right\">\n<pre class=\"brush: haskell; title: Haskell; notranslate\" title=\"Haskell\">\r\n  data Value = \r\n      IntegerValue Int\r\n    | FloatValue Float\r\n    | StringValue String\r\n    | BooleanValue Bool\r\n<\/pre>\n<\/div>\n<hr class=\"clear\" \/>\n<p>Quite naturally, functional languages include a special construct to handle values whose type is a sum type. This construct is called  <em>pattern matching<\/em>; it is a bit like a <code>switch<\/code> statement, only the switch is on the object&#8217;s inner type (the subset of the sum type which the object belongs to). For example, to identify the contents of a variable whose type is <code>Value<\/code>, functional programmers write<\/p>\n<div class=\"left\">\n<pre class=\"brush: fsharp; title: Ocaml\/F#; notranslate\" title=\"Ocaml\/F#\">\r\n  let identify = function\r\n    | IntegerValue (val) -&gt; &quot;int!&quot; \r\n    | FloatValue (val) -&gt; &quot;float!&quot; \r\n    | StringValue (str) -&gt; &quot;string&quot;\r\n    | BooleanValue (b) -&gt; &quot;bool!&quot;\r\n<\/pre>\n<\/div>\n<div class=\"right\">\n<pre class=\"brush: haskell; title: Haskell; notranslate\" title=\"Haskell\">\r\n  identify (IntegerValue val) = \"int!\" \r\n  identify (FloatValue val) = \"float!\" \r\n  identify (StringValue str) = \"string\"\r\n  identify (BooleanValue b) = \"bool!\"\r\n<\/pre>\n<\/div>\n<hr class=\"clear\" \/>\n<p>Here&#8217;s another example, on lists this time: if functional programmers want to count elements in a list, they write<\/p>\n<pre class=\"brush: fsharp; title: Ocaml\/F#; notranslate\" title=\"Ocaml\/F#\">\r\n  let rec count = function\r\n    | Nil -&gt; 0\r\n    | Cons (head, tail) -&gt; 1 + (count tail)\r\n  ;;\r\n<\/pre>\n<pre class=\"brush: haskell; title: Haskell; notranslate\" title=\"Haskell\">\r\n  count Nil = 0\r\n  count (Cons hd tl) = 1 + count tl\r\n<\/pre>\n<p>Let&#8217;s move on to our last example<\/p>\n<h3>XML node trees<\/h3>\n<p>In this section, we use a sum type to represent the structure of an XML document (some node types have been omitted). We first define an attribute to be a pair of two <code>string<\/code>s, and a node to be one of <code>DocumentFragment<\/code> (a list of XML nodes), <code>Element<\/code> (a name, attributes, and children elements), <code>Commment<\/code> (some comment text), or <code>Text<\/code> (plain text).<\/p>\n<div class=\"left\">\n<pre class=\"brush: fsharp; title: Ocaml; notranslate\" title=\"Ocaml\">\r\n  type attr = \r\n    Attribute of string * string;;\r\n\r\n  type xmlnode = \r\n    | DocumentFragment of xmlnode list\r\n    | Element of string \r\n                  * (attr list)\r\n                  * (xmlnode list)\r\n    | Comment of string\r\n    | Text of string\r\n  ;;\r\n<\/pre>\n<\/div>\n<div class=\"right\">\n<pre class=\"brush: haskell; title: Haskell; notranslate\" title=\"Haskell\">\r\n  data XmlNode = \r\n    | DocumentFragment &#x5B;XmlNode]\r\n    | Attribute String String\r\n    | Element String &#x5B;Attribute]\r\n                     &#x5B;XmlNode]\r\n    | Comment String\r\n    | Text String\r\n<\/pre>\n<\/div>\n<hr class=\"clear\" \/>\n<p>With such a type declaration, writing concise tree-traversal functions is easy: as an example, we define a <code>serialize<\/code> function, which generate XML code from an <code>xmlnode<\/code> object.<\/p>\n<pre class=\"brush: fsharp; title: Ocaml; notranslate\" title=\"Ocaml\">\r\n  let serialize_attribute (Attribute(name, value)) =\r\n    Printf.printf &quot; %s=\\&quot;%s\\&quot;&quot; name value\r\n  ;;\r\n\r\n  let rec serialize = function \r\n    | DocumentFragment nodes -&gt;\r\n        List.iter serialize nodes (* applies serialize to every xmlnode in nodes *)\r\n    | Element (label, attributes, nodes) -&gt;\r\n        Printf.printf &quot;&lt;%s&quot; label;\r\n          List.iter serialize_attribute attributes;\r\n        Printf.printf &quot;&gt;\\n&quot;;\r\n          List.iter serialize nodes;\r\n        Printf.printf &quot;&lt;\/%s&gt;\\n&quot; label;\r\n    | Comment (str) -&gt;\r\n        Printf.printf &quot;&lt;!-- %s --&gt;&quot; str;\r\n    | Text (str) -&gt;\r\n        Printf.printf &quot;%s\\n&quot; str;\r\n  ;;\r\n<\/pre>\n<p>Here is a small example to illustrate the previous function:<\/p>\n<pre class=\"brush: fsharp; title: ; notranslate\" title=\"\">\r\n  serialize (\r\n    Element(\"a\", &#x5B;\r\n      Attribute(\"href\", \"http:\/\/pit-claudel.fr\/clement\/blog\");\r\n      Attribute(\"title\", \"Code crumbs\")\r\n    ], &#x5B;\r\n      Text(\"Code crumbs -- Personal thoughts about programming\")\r\n    ])\r\n  );;\r\n<\/pre>\n<p>This prints <\/p>\n<pre class=\"brush: xml; title: ; notranslate\" title=\"\">\r\n&lt;a href=&quot;http:\/\/pit-claudel.fr\/clement\/blog&quot; title=&quot;Code crumbs&quot;&gt;\r\nCode crumbs -- Personal thoughts about programming\r\n&lt;\/a&gt;\r\n<\/pre>\n<p>We can now move on to the core of this article:<\/p>\n<h2 id=\"abstract-classes\">Simulating sum types using abstract classes<\/h2>\n<h3>The problem<\/h3>\n<p>Returning to a simpler example, suppose we need to represent a generic binary tree (a tree is defined as being either a branch, containing two sub-trees, or a leaf, containing a value). Functional programmers will write something like<\/p>\n<pre class=\"brush: fsharp; title: Ocaml\/F#; notranslate\" title=\"Ocaml\/F#\">\r\n  type 't tree = Branch of ('t tree * 't tree) | Leaf of 't\r\n  let example = Branch (Branch (Leaf 1, Leaf 2), Leaf 3)\r\n<\/pre>\n<pre class=\"brush: haskell; title: Haskell; notranslate\" title=\"Haskell\">\r\n  data Tree t = Branch (Tree t) (Tree t) | Leaf t\r\n  example = Branch (Branch (Leaf 1) (Leaf 2)) (Leaf 3)\r\n<\/pre>\n<p>On the other hand, (hurried) C#\/Java programmers will often implement this as a product type &#8212; that is, they&#8217;ll pack both cases in a single class, and use an extra field to differentiate branches and leaves:<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\n  enum TreeType = {BRANCH, LEAF};\r\n\r\n  class BadTree&lt;T&gt; {\r\n    TreeType type;\r\n    \r\n    T leaf_value;\r\n    BadTree&lt;T&gt; left_branch, right_branch;\r\n\r\n    BadTree(BadTree&lt;T&gt; left_branch, BadTree&lt;T&gt; right_branch) {\r\n         this.type = BRANCH;\r\n         this.left_branch = left_branch;\r\n         this.right_branch = right_branch;\r\n         \/\/ Memory waste: leaf_value is unused (!)\r\n    }\r\n\r\n    BadTree(T leaf_value) {\r\n         this.type = LEAF;\r\n         this.leaf_value = leaf_value;\r\n         \/\/ Memory waste: left_branch and right_branch are unused (!)\r\n    }\r\n  }\r\n\r\n  BadTree&lt;int&gt; example = new BadTree(new BadTree(new BadTree(1), \r\n                                                 new BadTree(2)),\r\n                                     new Tree(3));\r\n<\/pre>\n<p>This representation wastes a lot of memory and, though rather concise, quickly degenerates when more cases are added to the type definition (for example in the <code>xmlnode<\/code> case &#8212; for another real-world example, see the end of this post).<\/p>\n<h3>The solution<\/h3>\n<p>The idea is to encode the type disjunction (<code>Branch<\/code> <em>or<\/em> <code>Leaf<\/code>, <code>DocumentFragment<\/code> <em>or<\/em> <code>Element<\/code> <em>or<\/em> <code>Comment<\/code> <em>or<\/em> <code>Text<\/code>) in the inheritance hierarchy; practically speaking, the trick is to define an abstract <code>Tree<\/code> class, and make two new classes, <code>Leaf<\/code> and <code>Branch<\/code>, which both inherit from <code>Tree<\/code>:<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\n  abstract class Tree&lt;T&gt; {\r\n    public static class Leaf&lt;T&gt; : Tree&lt;T&gt; {\r\n      public T value;\r\n\r\n      public Leaf(T value) {\r\n        this.value = value;\r\n      }\r\n    }\r\n\r\n    public static class Branch&lt;T&gt; : Tree&lt;T&gt; {\r\n      public Tree&lt;T&gt; left, right;\r\n\r\n      public Branch(Tree&lt;T&gt; left, Tree&lt;T&gt; right) {\r\n        this.left = left;\r\n        this.right = right;\r\n      }\r\n    }\r\n  }\r\n\r\n  Tree&lt;int&gt; example = new Tree::Branch(new Tree::Branch(new Tree::Leaf(1),\r\n                                                        new Tree::Leaf(2)),\r\n                                       new Tree::Leaf(3));\r\n<\/pre>\n<p>This allows for much cleaner code ; and the memory waste is gone!<\/p>\n<h2>Pattern matching on abstract classes<\/h2>\n<p>The <code>BadTree<\/code> class, which uses an <code>enum<\/code> to distinguish branches and nodes, makes it possible to performe pattern matchings over the <code>Tree<T><\/code> class, by switching on the <code>Tree<T>.type<\/code> field. Though this approach directly translates to the <code>Tree<\/code> abstract class using the <code>is<\/code> keyword ((Java equivalent: <code>instanceof<\/code>)), it&#8217;s not ideal.<\/p>\n<div class=\"left\">\n<pre class=\"brush: csharp; title: Original pattern-matching code; notranslate\" title=\"Original pattern-matching code\">\r\n  Tree&lt;int&gt; tree = (...);\r\n\r\n  switch (tree.type) {\r\n    case TreeType.BRANCH:\r\n      (...); break;\r\n    case TreeType.LEAF:\r\n      (...); break;\r\n  }\r\n<\/pre>\n<\/div>\n<div class=\"right\">\n<pre class=\"brush: csharp; title: Direct translation (not great); notranslate\" title=\"Direct translation (not great)\">\r\n  Tree&lt;int&gt; tree = (...);\r\n\r\n  if (tree is Tree::Branch) {\r\n    Tree::Branch branch = \r\n      (Tree::Branch) btree;\r\n    (...);\r\n  } else if (tree is Tree::Leaf) {\r\n    Tree::Leaf leaf = \r\n      (Tree::Leaf) tree;\r\n    (...);\r\n  }\r\n<\/pre>\n<\/div>\n<hr class=\"clear\" \/>\n<p>Though functional, this new version of the pattern-matching code is not really pretty. A much cleaner approach is to implement the pattern matching directly in the derived classes (<code>Branch<\/code> and <code>Leaf<\/code>), by declaring an abstract method in the parent class, <code>Tree<\/code>. That is, instead of having one big function doing some explicit pattern matching in the <code>Tree<\/code> class, we&#8217;ll have multiple specialized functions &#8212; one in each class inheriting <code>Tree<\/code>. Philosophically speaking, just like we implemented <em>type disjunctions<\/em> as different classes inheriting a common parent, we&#8217;ll implement <em>logical disjunctions<\/em> as different functions overriding a common parent.<\/p>\n<p>For example, here is how we can write a <code>Fork<\/code> function, which returns a new <code>Tree<\/code> with every leaf split in two identical leaves:<\/p>\n<pre class=\"brush: csharp; highlight: [2,9,10,11,19,20,21]; title: ; notranslate\" title=\"\">\r\n  abstract class Tree&lt;T&gt; {\r\n    public abstract Tree&lt;T&gt; Fork();\r\n\r\n    public static class Leaf&lt;T&gt; : Tree&lt;T&gt; {\r\n      public T value;\r\n\r\n      \/\/ Constructor omitted\r\n\r\n      public override Tree&lt;T&gt; Fork() {\r\n        return new Tree::Branch(new Tree::Leaf(value), new Tree::Leaf(value));\r\n      }\r\n    }\r\n\r\n    public static class Branch&lt;T&gt; : Tree&lt;T&gt; {\r\n      public Tree&lt;T&gt; left, right;\r\n\r\n      \/\/ Constructor omitted\r\n\r\n      public override Tree&lt;T&gt; Fork() {\r\n        return new Tree::Branch(left.Fork(), right.Fork());\r\n      }\r\n    }\r\n  }\r\n<\/pre>\n<h2>Going further: a real-life example<\/h2>\n<p>The benefits of using this abstract class approach are particularly visible when the data types involved get more complex. Below is an example implementation of a data structure used to describe arithmetic expressions.<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\n  abstract class ArithmeticExpression {\r\n    \/\/ Evaluates an arithmetic expression. context is a dictionary mapping variable\r\n    \/\/  names to their values.\r\n    public abstract float? Eval(Dictionary&lt;String, float?&gt; context);\r\n\r\n    public class Variable : ArithmeticExpression {\r\n      string label;\r\n\r\n      public override float? Eval(Dictionary&lt;String, float?&gt; context) {\r\n        float? value;\r\n        context.TryGetValue(label, out value);\r\n        return value;\r\n      }\r\n    }\r\n    \r\n    public abstract class BinaryOperation : ArithmeticExpression {\r\n      protected ArithmeticExpression left_operand, right_operand;\r\n\r\n      public BinaryOperation(ArithmeticExpression left_operand, ArithmeticExpression right_operand) {\r\n        this.left_operand = left_operand;\r\n        this.right_operand = right_operand;\r\n      }\r\n\r\n      protected abstract float? Apply(float left_value, float right_value);\r\n\r\n      public override float? Eval(Dictionary&lt;String, float?&gt; context) {\r\n        float? left_result = left_operand.Eval(context), right_result = right_operand.Eval(context);\r\n\r\n        if (left_result.HasValue &amp;&amp; right_result.HasValue)\r\n          return Apply(left_result.Value, right_result.Value);\r\n        else\r\n          return null;\r\n      }\r\n\r\n      class Sum : BinaryOperation {\r\n        public Sum(ArithmeticExpression left_operand, ArithmeticExpression right_operand) : base(left_operand, right_operand) { }\r\n\r\n        protected override float? Apply(float left_value, float right_value) {\r\n          return left_value + right_value;\r\n        }\r\n      }\r\n\r\n      class Subtraction : BinaryOperation {\r\n        public Subtraction(ArithmeticExpression left_operand, ArithmeticExpression right_operand) : base(left_operand, right_operand) { }\r\n\r\n        protected override float? Apply(float left_value, float right_value) {\r\n          return left_value - right_value;\r\n        }\r\n      }\r\n\r\n      class Product : BinaryOperation {\r\n        public Product(ArithmeticExpression left_operand, ArithmeticExpression right_operand) : base(left_operand, right_operand) { }\r\n\r\n        protected override float? Apply(float left_value, float right_value) {\r\n          return left_value * right_value;\r\n        }\r\n      }\r\n\r\n      class Division : BinaryOperation {\r\n        public Division(ArithmeticExpression left_operand, ArithmeticExpression right_operand) : base(left_operand, right_operand) { }\r\n\r\n        protected override float? Apply(float left_value, float right_value) {\r\n          return left_value \/ right_value;\r\n        }\r\n      }\r\n    }\r\n  }\r\n<\/pre>\n","protected":false},"excerpt":{"rendered":"<p>After a short introduction to pattern matching in functional programming languages, this article shows how abstract classes can emulate sum types in Java\/C#.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[16,24,25,17],"tags":[26,31,29,28,27,30],"_links":{"self":[{"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/posts\/186"}],"collection":[{"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/comments?post=186"}],"version-history":[{"count":6,"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/posts\/186\/revisions"}],"predecessor-version":[{"id":655,"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/posts\/186\/revisions\/655"}],"wp:attachment":[{"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/media?parent=186"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/categories?post=186"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/pit-claudel.fr\/clement\/blog\/wp-json\/wp\/v2\/tags?post=186"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}