Saturday, May 18, 2013

How could Scala do a merge sort?

Merge sort is a classical "divide and conquer" sorting algorithm. You should have to never write one because you'd be silly to do that when a standard library class already will already do it for you. But, it is useful to demonstrate a few characteristics of programming techniques in Scala. Firstly a quick recap on the merge sort. It is a divide and conquer algorithm. A list of elements is split up into smaller and smaller lists. When a list has one element it is considered sorted. It is then merged with the list beside it. When there are no more lists to merged the original data set is considered sorted. Now let's take a look how to do that using an imperative approach in Java.
public void sort(int[] values) {
   int[] numbers = values;
   int[] auxillaryNumbers = new int[values.length];
   mergesort(numbers, auxillaryNumbers, 0, values.length - 1);
}

private void mergesort(int [] numbers, int [] auxillaryNumbers, int low, int high) {
    // Check if low is smaller then high, if not then the array is sorted
    if (low < high) {
        // Get the index of the element which is in the middle
        int middle = low + (high - low) / 2;
       
        // Sort the left side of the array
        mergesort(numbers, auxillaryNumbers, low, middle);
        // Sort the right side of the array
        mergesort(numbers, auxillaryNumbers, middle + 1, high);
      
        // Combine them both
        // Alex: the first time we hit this when there is min difference between high and low.
        merge(numbers, auxillaryNumbers, low, middle, high);
    }
}

/**
 * Merges a[low .. middle] with a[middle..high].
 * This method assumes a[low .. middle] and a[middle..high] are sorted. It returns
 * a[low..high] as an sorted array. 
 */
private void merge(int [] a, int[] aux, int low, int middle, int high) {
    // Copy both parts into the aux array
    for (int k = low; k <= high; k++) {
        aux[k] = a[k];
    }

    int i = low, j = middle + 1;
    for (int k = low; k <= high; k++) {
        if (i > middle)                      a[k] = aux[j++];
        else if (j > high)                   a[k] = aux[i++];
        else if (aux[j] < aux[i])            a[k] = aux[j++];
        else                                 a[k] = aux[i++];
    }
}
 
public static void main(String args[]){
     ...
     ms.sort(new int[] {5, 3, 1, 17, 2, 8, 19, 11});
     ...
}

}
Discussion...
  1. An auxillary array is used to achieve the sort. Elements to be sorted are copied into it and then once sorted copied back. It is important this array is only created once otherwise there can be a performance hit from extensive array created. The merge method does not have to create an auxiliary array however since it changes an object it means the merge method has side effects.
  2. Merge sort big(O) performance is N log N.
Now let's have a go at a Scala solution.
  def mergeSort(xs: List[Int]): List[Int] = {
    val n = xs.length / 2
    if (n == 0) xs
    else {
      def merge(xs: List[Int], ys: List[Int]): List[Int] =
          (xs, ys) match {
          case(Nil, ys) => ys
          case(xs, Nil) => xs
          case(x :: xs1, y :: ys1) =>
            if (x < y) x::merge(xs1, ys)
            else y :: merge(xs, ys1)
      }
      val (left, right) = xs splitAt(n)
      merge(mergeSort(left), mergeSort(right))
    }
  }  
Key discussion points:
  1. It is the same divide and conquer idea.
  2. The splitAt function is used to divide up the data up each time into a tuple. For every recursion this will new a new tuple.
  3. The local function merge is then used to perform the merging. Local functions are a useful feature as they help promote encapsulation and prevent code bloat.
  4. Neiher the mergeSort() or merge() functions have any side effects. They don't change any object. They create (and throw away) objects.
  5. Because the data is not been passed across iterations of the merging, there is no need to pass beginning and ending pointers which can get very buggy.
  6. This merge recursion uses pattern matching to great effect here. Not only is there matching for data lists but when a match happens the data lists are assigned to variables:
    • x meaning the top element in the left list
    • xs1 the rest of the left list
    • y meaning the top element in the right list
    • ys1 meaning the rest of the data in the right list
This makes it very easy to compare the top elements and to pass around the rest of the date to compare. Would the iterative approach be possible in Java? Of course. But it would be much more complex. You don't have any pattern matching and you don't get a nudge to declare objects as immutable as Scala does with making you make something val or var. In Java, it would always be easier to read the code for this problem if it was done in an imperative style where objects are being changed across iterations of a loop. But Scala a functional recursive approach can be quite neat. So here we see an example of how Scala makes it easier to achieve good, clean, concise recursion and a make a functional approach much more possible.

Friday, May 17, 2013

Coursera's Scala course

Coursera run an excellent Scala course which I just had the opportunity of participating in. The course duration is seven weeks. Each week consists of about 1.5 hours of lectures and then an assignment which can take anything between an hour to about 5 hours.  The course syllabus  is outlined here.  So personal opinion time...

Was it worth it?  Absolutely.  Unless you are a complete pro in Scala and Functional Programming you will learn something from this course - most importantly a deeper understanding of the FP paradigm.  

I remember many eons ago when I first started learning OO, like many noobs I thought I understood OO when I understood polymorphism, inheritance, encapsulation and could name check a few design patterns.  It took me a while to really realise the difference between good and bad abstractions and how dependencies that look benign can drastically increase software entropy in a project.  Similarly many people might approach FP thinking it is just all about function literals,  2nd order functions, inner functions and closures.   Well the first important point to make about this course is that it does a super job of emphasising the importance of smart and clever usage of recursion in FP.  This was not apparent to me before the course.  The reason why recursion is a big deal in FP is of course because immutable state is a big deal in FP. That is easier to achieve when you pass data between iterations as in recursion than an imperative style loop which can usually mean some object(s) is being changed across iterations. 

Now, I hope that made some sense. Because the real brain tease is when you are given a problem that you could do with one arm tied behind your back using a for loop and told to do it with recursion.  It takes a lot of practise to get really good at recursion and it is something I still have to practise more myself but the course really made me think about it much much much more than I ever did previously.

So what else did I learn?
  1. Buzz words - exact difference between function application and function type
  2. Left association for function application and right association for function type.
  3. Passing functions around anonymously - you should only be rarely using def for functions that are being passed
  4. The Art of DRY (Don’t repeat yourself) in FP.  Functions should always be short and if it makes sense to abstract out common parts do so
  5. Difference between val, lazy val, def evaluation times (evaluated once, evaluated lazily and evaluated every time respectively
  6. The power of pattern matching, especially when using it with recursion
  7. Streams – lazy lists and the memorization potential
  8. It is extremely difficult doing a Scala course when you have two very young children.
Overall a great course. I hope to elaborate on some of  ideas and topics in future posts.


Sunday, April 21, 2013

Book Review: Beginning Scala (David Pollak)

Firstly, sorry it has been so long since last blog post.  Life is busy when you have two children.  I decided to try and learn Scala in 2013 and I am currently still pluggin' away.  This blog post is a review of David Pollack's Beginning Scala.

David Pollack has been writing software since 1977.  He wrote diagnostic software for the Commodore 64, the first real-time spreadsheet and founded the Lift Web Framework in 2007.   He describes his experience with Scala as an epiphany that changed the way he approach software; he certainly writes with enthusiasm and passion and in this book we even get a forward from the Godfather himself Martin Odersky.

'Beginning Scala' focusses very much on the core fundamentals of Scala.  The book in length is just under 300 pages. This is quite short in comparison to say Martin Odersky's excellent 'Programming in Scala' which is well over twice the length.  Where I see the sweet spot for Beginning Scala is for someone who wants to dip their toe into the water and is curious about the language feels and wants to get a good overview of the language quickly.  For more a detailed and substantial take on the language something like Odersky's book is necessary.

Areas covered include:
  • Scala traits and Scala's type system
  • Call-by-name
  • Scala collections and their immutable nature
  • Functional characteristics (passing functions, returning functions)
  • Pattern matching
  • Actors and concurrency
There are also some tips on how to introduce Scala to your team and some best practise advice.  My favourite parts were the explanation of Scala traits and the demonstration of the classic GoF Visitor Pattern made so simple using pattern matching (I'll cover this in a separate post). The one criticism I'd have is that I think what really sets Scala apart from Java is not that has lots of syntactic sugar but that it offers a functional programming approach.  This doesn't just mean you can pass functions to function, return them functions and so on but that you have to approach problems in a very different way.  In functional programming recursion is favoured over iteration.   There are many problems that developers could solve using iteration with one arm tied behind their back but to solve them using recursion is trickier.  One of Scala's features is that it facilitates both approaches but I think if you are going to embrace the functional paradigm properly you need to ditch iteration and embrace recursion.  This isn't really covered in the book - in fairness it is not really covered in detail Odersky's book either. However, if you look at the Scala course on coursera it is a massive massive massive massive massive massive massive massive part of it.

So overall, a very good book and well worth a dabble for someone that wants a dabble in Scala but if you want more you'll need more.





Sunday, January 27, 2013

Scala pattern matching: A Case for new thinking?

A new thinking?
The 16th President of the United States. Abraham Lincoln once said: "As our case is new we must think and act anew".   In software engineering things probably aren't as dramatic as civil wars and abolishing slavery but we have interesting logical concepts concerning "case". In Java the case statement provides for some limited conditional branching.  In Scala, it is possible to construct some very sophisticated pattern matching logic using the case / match construct which doesn't just bring new possibilities but a new type of thinking to realise new possibilities.

Let's start with a classical 1st year Computer Science homework assignment: a fibonacci series that doesn't start with 0, 1 but that starts with 1, 1.   So the series will look like: 1, 1, 2, 3, 5, 8, 13, ... every number is the sum of the previous two.

In Java, we could do:
public int fibonacci(int i) {
    if (i < 0) 
        return 0;
    switch(i) {
        case 0:
            return 1;
        case 1:
            return 1;
        default:
            return fibonacci(i-1) + fibonacci(i - 2);
    }
}     
All straight forward. If 0 is passed in it counts as the first element in the series so 1 should be returned. Note: to add some more spice to the party and make things a little bit more interesting I added a little bit of logic to return 0 if a negative number is passed in to our fibonacci method.

In Scala to achieve the same behaviour we would do:
def fibonacci(in: Int): Int = {
  in match {
    case n if n <= 0 => 0
    case 0 | 1 => 1
    case n => fibonacci(n - 1) + fibonacci(n- 2)
  }
}
Key points:
  • The return type of the recursive method fibonacci is an Int. Recursive methods must explictly specify the return type (see: Odersky - Programming in Scala - Chapter 2).
  • It is possible to test for multiple values on the one line using the | notation. I do this to return a 1 for both 0 and 1 on line 4 of the example.
  • There is no need for multiple return statements. In Java you must use multiple return statements or multiple break statements.
  • Pattern matching is an expression which always returns something.
  • In this example, I employ a guard to check for a negative number and if it a number is negative zero is returned.
  • In Scala it is also possible to check across different types. It is also possible to use the wildcard _ notation. We didn't use either in the fibonacci, but just to illustrate these features...
    def multitypes(in: Any): String = in match {
      case i:Int => "You are an int!"
      case "Alex" => "You must be Alex"
      case s:String => "I don't know who you are but I know you are a String"
      case _ => "I haven't a clue who you are"
    }
    
Pattern matching can be used with Scala Maps to useful effect.  Suppose we have a Map to capture who we think should be playing in each position of the Lions backline for the Lions series in Austrailia.  The keys of the map will be the position in the back line and the corresponding value will be the player who we think should be playing there.  To represent a Rugby player we use a case class. Now now you Java Heads, think of the case class as an immutable POJO written in extremely concise way - they can be mutable too but for now think immutable.
case class RugbyPlayer(name: String, country: String);
val robKearney = RugbyPlayer("Rob Kearney", "Ireland");
val georgeNorth = RugbyPlayer("George North", "Wales");
val brianODriscol = RugbyPlayer("Brian O'Driscol", "Ireland");
val jonnySexton = RugbyPlayer("Jonny Sexton", "Ireland");  
val benYoungs = RugbyPlayer("Ben Youngs", "England");
    
// build a map
val lionsPlayers = Map("FullBack" -> robKearney, "RightWing" -> georgeNorth, 
      "OutsideCentre" -> brianODriscol, "Outhalf" -> jonnySexton, "Scrumhalf" -> benYoungs);
    
// Note: Unlike Java HashMaps Scala Maps can return nulls. This achieved by returing
// an Option which can either be Some or None. 
    
// So, if we ask for something that exists in the Map like below
println(lionsPlayers.get("Outhalf"));  
// Outputs: Some(RugbyPlayer(Jonny Sexton,Ireland))
    
// If we ask for something that is not in the Map yet like below
println(lionsPlayers.get("InsideCentre"));
// Outputs: None
In this example we have players for every position except inside centre - which we can't make up our mind about.  Scala Maps are allowed to store nulls as values.  Now in our case we don't actually store a null for inside center. So, instead of null being returned for inside centre (as what would happen if we were using a Java HashMap), the type None is returned.

For the other positions in the back line, we have matching values and the type Some is returned which wraps around the corresponding RugbyPlayer. (Note: both Some and None extend from Option).

We can write a function which pattern matches on the returned value from the HashMap and returns us something a little more user friendly.
def show(x: Option[RugbyPlayer]) = x match {
  case Some(rugbyPlayerExt) => rugbyPlayerExt.name  // If a rugby player is matched return its name
  case None => "Not decided yet ?" // 
}
println(show(lionsPlayers.get("Outhalf")))  // outputs: Jonny Sexton
println(show(lionsPlayers.get("InsideCentre"))) // Outputs: Not decided yet
This example doesn't just illustrate pattern matching but another concept known as extraction. The rugby player when matched is extracted and assigned to the rugbyPlayerExt.  We can then return the value of the rugby player's name by getting it from rugbyPlayerExt.  In fact, we can also add a guard and change around some logic. Suppose we had a biased journalist (Stephen Jones) who didn't want any Irish players in the team. He could implement his own biased function to check for Irish players
def biasedShow(x: Option[RugbyPlayer]) = x match {
  case Some(rugbyPlayerExt) if rugbyPlayerExt.country == "Ireland" => 
     rugbyPlayerExt.name + ", don't pick him."
  case Some(rugbyPlayerExt) => rugbyPlayerExt.name
  case None => "Not decided yet ?"
}
println(biasedShow(lionsPlayers.get("Outhalf"))) // Outputs Jonny... don't pick him
println(biasedShow(lionsPlayers.get("Scrumhalf"))) // Outputs Ben Youngs

Pattern matching Collections

Scala also provides some powerful pattern matching features for Collections. Here's a trivial exampe for getting the length of a list.
def length[A](list : List[A]) : Int = list match {
  case _ :: tail => 1 + length(tail)
  case Nil => 0
}
And suppose we want to parse arguments from a tuple...
  def parseArgument(arg : String, value: Any) = (arg, value) match {
    case ("-l", lang) => setLanguage(lang)  
    case ("-o" | "--optim", n : Int) if ((0 < n) && (n <= 3)) => setOptimizationLevel(n)
    case ("-h" | "--help", null) => displayHelp()
    case bad => badArgument(bad)
  }

Single Parameter functions

Consider a list of numbers from 1 to 10. The filter method takes a single parameter function that returns true or false. The single parameter function can be applied for every element in the list and will return true or false for every element. The elements that return true will be filtered in; the elements that return false will be filtered out of the resultant list.
scala> val myList = List(1,2,3,4,5,6,7,8,9,10)
myList: List[Int] = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)

scala> myList.filter(x => x % 2 ==1)
res13: List[Int] = List(1, 3, 5, 7, 9)
Now now now, listen up and remember this. A pattern can be passed to any method that takes a single parameter function. Instead of passing a single parameter function which always returned true or false we could have used a pattern which always returns true or false.
scala> myList.filter {
     |     case i: Int => i % 2 == 1   // odd number will return false
     |     case _ => false             // anything else will return false
     | }
res14: List[Int] = List(1, 3, 5, 7, 9)

Use it later?

Scala compiles patterns to a PartialFunction.  This means that not only can Scala pattern expressions be passed to other functions but they can also be stored for later use.
scala> val patternToUseLater = : PartialFunction[String, String] = {
     |   case "Dublin" => "Ireland"
     |   case _ => "Unknown"
      }
What this example is saying is patternToUseLater is a partial function that takes a string and returns a string.  The last statemeent in a function is returned by default and because the case expression is a partial function it will returned as a partial function and assigned to pattenrToUseLater which of course can use it later.

Finally, Johnny Sexton is a phenomenal Rugby player and it is a shame to hear he is leaving Leinster. Obviously, with Sexton's busy schedule we can't be sure if Johnny is reading this blog but if he is, Johnny sorry to see you go we wish you all the best and hopefully will see you back one day in the Blue Jersey.