Saturday, May 12, 2018

Scala Syntax: 7 points

A few years back I dipped into some Scala as a hobby language. Recently, in order to get a quick overview of Spark I did 'Big Data Analysis with Scala and Spark' from Coursera It's a great course. But, one aspect I found challenging was just getting my head around Scala syntax again. Some of it, yeah the basic stuff can be counter intuitive depending on your perspective.

1. Method / Function Definition

Typing on the right rather than the left. Consider this simple function definition:
def sayHello(param: String): String = {
    "Hello" + param
}
Javaholics will note:
  • The return is specified at the end of the method definition, rather than the beginning. 
  • The type of the parameter is specified after the parameter name rather than before. 
  • Before the function body there is a = 
  • There are two colons (:), one between the parameter and the type and one before the return type.

2.  Unit

Google "Unit" and you be quickly told you that Unit is the Scala's version of Java void.  But, Java’s void is a keyword.  Scala’s Unit is a final class which only has one value: () - which is like an alias for no information. Unit indicates a method returns nothing and therefore has side effects, something we don't want to do much of in Scala. So is that counter intuitive? No.
But here is what I find is. If a function has no return type in the function definition and no equals it means Unit is implicitly the return type. Example:
def procedure {
    println "This String is not returned"
}

procedure: ()Unit
Big deal? Of course not. But what about:
def procedure {
     "This String is not returned"
}
Expect the String to be returned, it wont be. How about this?
def addNumbers(a: Integer, b: Integer) {
    return a + b
}
This will give a compile warning: :12: warning: enclosing method addNumbers has result type Unit: return value discarded return a + b It will compile but nothing will be returned:
def addNumbers(a: Integer, b: Integer) {
    a + b
}
will give no compile warning and will also return nothing.

3.  Underscore

In anonymous Scala functions, _ is like Groovy's it. In Groovy we can to multiple all numbers between 1 and 5 we can do:
(1..5).collect {it * 2}
In Scala we can do:
(1 to 5).map{_*2}
However, in Scala, the second time _ is referenced, it refers to the second parameter
val ns = List(1, 2, 3, 4)
val s0 = ns.foldLeft (0) (_+_) //10

4. Passing anonymous functions. 

Pass one anonymous function and you don't need any curly parenthesis. Pass two and you do.
def compose(g:R=>R, h:R=>R) = (x:R) => g(h(x)) 
val f = compose({_*2}, {_-1})

5. Arity-0 

When a method has no arguments, (arity-0), the parentheses can be omitted in invocation
size()
...
size  // do it like this 
But this technique should never be used when method has side effects. So,
queue.size // ok
println // not ok do println()

6. Declare parameter types

Function defiinitions / Method definition have to declare parameter types but function literals don’t.
def addNumbers(a, b): Number {
:1: error: ':' expected but ',' found.

7. Ternary Operator

There is no ternary operator in Scala. There is one in Java, Groovy, JavaScript. Python 2.5 added support for it. Instead you can do if else on one line and since if / else is an expression you can return a value. For example: In Java we would do:
(eurovision.winner == "Ireland") ? "Yippee" : "It's a fix"
Scala, it's:
if (eurovision.winner == "Ireland") "Yippee" else "It's a fix"

Friday, May 11, 2018

And some more REST tips

In previous blog posts I have covered some ideas and tips for achieving a REST architecture. In this post, I cover a few more ideas and tips.

Caching

  • Caching is a big part of the original dissertation.  See section 5.1.4 
  • Strategies include validation (client checks it has the latest version) and expiration (client assumes it has the latest version until a specified time)
  • Expiration:
    • Expires header tells client when resource is going to expire. The value 0 means avoid caching
    • Cache-Control
      • Use max-age directive to specify how long response should be considered valid for; s-maxage for shared caches
      • Can also be used in requests no-cache means re validate response with server
  • Validation
    • Etag - unique version of resource. Used in conjunction with If-none-match request header
    • Last-Modified - tells client when resource last changed

Controller APIs

  • When something does fit neatly to a CRUD operation, consider a Controller API

Handling Dates

  • Use ISO-8601 for your dates - better for natural sorting, handles timezone, locale nuetral, support from most programming languages
  • Accept any timezone as anyone in the world may call your API
  • Store in UTC, not in your server's timezone.  There should be no offset when persisted.
  • Return in UTC.  Allow the client to adjust to its timezone as necessary
  • Don't use time if you don't need it.  If Date only suffices, only persist Date. This means, timezone complexity goes away. 

HEAD

Headers

  • Always return what headers are useful.  Consider: 
    • Content-Type
    • Content-Length
    • Last-Modified
    • ETag
    • Location

Hypermedia (advantages)

  • Less coupling
  • Consistent format for links => cleaner client code
  • Developer productivity: API's easier to navigate 
  • Make easier to introduce services in a more granular way
  • Code easier to debug - messages always have the URL that created them via the self link

Hypermedia (choices)

  • HAL - reduces Address coupling 
  • SIREN - reduces Address and Actions coupling
  • Collection+JSON (CJ) - reduces Address, Action and Object coupling

Idempotent

  • Can be called several times and return the same result
  • OPTIONS, GET, HEAD, PUT and DELETE are all idempotent

Long Running Requests

  • Some operations take a long time.  In such cases, consider returning a 202 with the location field set to a URL the client can poll to check for operation progress.

Method not allowed

  • If an API only supports GET, it should return a 405 for any PUT, POST, DELETEs etc

Must Ignore Principle

  • Clients should ignore data they are not interested in. This makes it much easier for APIs to be backwardly compatible .  If an API returns extra data and some clients aren't expecting it they will just ignore it. 

Not acceptable

  • When a resource doesn't support a specific media type, it should return 406  (see Masse, Rule: 406 (“Not Acceptable”) must be used when the requested media type cannot be served

OPTIONS

  • OPTIONS should return what actions are available on a resource

Partial Update

  • Handle partial updates with  PATCH

Query

  • The query component of a URI should be used to filter collections

Resource Creation

  • When a Resource has been successfully created a 201 should be returned 
  • The location header should indicate the URL to get the Resource. 

Safe

  • Actions are considered Safe if they Do not modify resources
  • OPTIONS, GET and HEAD are safe

Self link

  • Response bodies should always include a self link - the URL that was used to return the resource. 

Singular or Plural?

  • Use Singular for Singular Document type resource  - when there can only be one.  For example: /humans/12343343/head
  • Otherwise plural

REST: Using a Controller endpoint?

In REST architectures, the fundamental concept is a Resource.   A Resource represents anything that’s important enough to be referenced as a thing in itself.   For example, a Shopping Cart, a Book or a Car.  The next fundamental concept is the Uniform Interface for accessing and manipulating the Resources.  In HTTP land usually means:
  • Create is POST 
  • Read is GET 
  • Update is PUT (or PATCH for Partial Update) 
  • Delete is DELETE
There are of course other concepts (statelessness, caching etc) but for this blog post, let's just focus on Resources.

In the real world,  many things map nicely to Resources.  However, inevitably somethings won't map so nicely to resources. This is usually a minority of operations for example reset password. It's possible to model these as either
  •  a PUT on /password/ 
or as
  •  a Controller endpoint and a POST to /resetpassword 
The latter may be considered to be closer to programmatic REST than pure REST, but there are times when clients and customers will want you to be pragmatic. This article gives suggestions regarding when to consider using the Controller option.

Does the action Map to a CRUD? 

Several actions in a real world application will not map nicely to a Create Read Update Delete (CRUD). For example, Paypal's cancel billing agreement API is:
POST /v1/payments/billing-agreements/agreement_id/cancel
The cancel action rarely maps nicely to a CRUD for a resource. It could be interpreted as:
  • some resource gets be created (A cancel record) 
  • some resource gets updated (some status column could be getting set to cancelled) 
  • or some resource gets deleted (a order request gets deleted). 
Why should the client have to care about how cancel is handled?  Couldn't it always change? In some case API's have got around the doesn't map nicely to a CRUD problem using HTTP tunneling. For cancelling a billing agreement this would like:
POST /v1/payments/billing-agreements/agreement_id
with body:
{
  "operation":"cancel"
}
This is considered an anti-pattern and should never be used. Instead a Controller end point should be used.

 

Resource State or Workflow? 

In a REST architecture, every request between Client or Server will usually change a Resource State (write operation) or the Application State (a query or read operation). However, in the real world workflows are inevitable. For example, a reset password flow usually consists of:
  • Asking the user for the userId (usually email) 
  • System checking that email exists on the system 
  • Sending the user an email with a link to reset the password 
  • Ensuring the user only has a set amount of time to click the link 
  • When the user clicks the link they may be asked a bunch of questions 
  • They will be asked to retype their new password to ensure there's no typos 
When an client action is part of a complex workflow, Resource state and Application state changes may not be easy to model. They may not happen synchronously and they could change based on how the workflow is modelled or when the workflow needs to add an extra step. In such scenarios, consider using a Controller end point.

 

REST without PUT 

In some situations, arguments can be made for avoiding PUT and instead using POST to a different endpoint which signifies intent. For example, to change address instead of invoking a PUT to /address/, the client would invoke a POST to /changeaddress and avoid PUTs altogether.  One example where this approach is useful is when handling asynchronous operations and you are trying to make clear atomic consistent operation.  So for example, if changing address takes a long time and you would rather return a 202, with a location field for the client to poll, if you use the /changeaddress you can then leave /address endpoints as those that are only atomically consistent.

So, any PUT or POST to address, means if you were to immediately do a GET you would get the consistent view of the Resource.  This approach is also useful if you want to model the Business event rather than the actual resource that is changing.  So for example, suppose 6 or 7 things need to take place when a Bank account has been closed for a Business process perspective.  All on the back end in the same thread / transaction.  Again, here POST to controller endpoint such as /accountclosed makes more sense then /DELETE to /account.

See this article for more info. 

Summary


So why there may be subjectivity involved on when to use a controller style endpoint.  The above may at least help to you to make a decision.  Remember, it should always only be a minority of APIs where you consider this approach. You are outside the conventional Uniform Interface for unique style operations but you want to still make them feel intuitive to clients of the API. 

Tuesday, February 27, 2018

Testing your code with Spock


Spock is a testing and specification framework for Java and Groovy applications.  Spock is:
  • extremely expressive 
  • facilitates the Given / When / Then syntax for your tests 
  • compatible with most IDEs and CI Servers.
Sounds interesting? Well you can start playing with Spock very quickly by paying a quick visit to the Spock web console.  When you have a little test you like, you can publish it like I did for this little Hello World test.


This Hello World test serves as a gentle introduction to some of the features of Spock.

Firstly, Spock tests are written in Groovy.  That means, some boiler plate code that you have with Java goes away.  There is
  • No need to indicate the class is Public as it is by default.
  • No need to declare firstWord and lastWord as Strings 
  • No need to hurt your little finger with a ; at the end every line
  • No need to explicitly invoke assert, as every line of code in the expect block gets that automatically.  Just make sure the lines in the then: block evaluate to a boolean expression.  If it is true the test passes otherwise it fails.  So in this case, it is just an equality expression which will either be true or false. You can have as many expressions as you want.
So less boiler plate code what next?  Well you know those really long test names you get with JUnit tests, well instead of having to call this test, helloWorldIntroductionToSpockTest() which is difficult to read, you can just use a String with spaces to name the test: Hello World introduction to Spock test. This makes things much more readable.

Thirdly, the Given: When: Then: syntax,  enforces test structure.  No random asserts all the test.  They are in a designated place.   More  complex tests, can use this structure to achieve BDD and ATDD.

Fourthly, if I were to make a small change to the test and change the assertion to also include Tony,  the test will of course fail. But when I get a failure in Spock, I get the full context of the expression that is tested.  I see the value of everything in the expression.  This makes it much quicker to diagnose problems when tests fail.


Not bad for an introduction.  Let's now have a look at more features. 

Mocking and Stubbing

Mocking and Stubbing are much more powerful than what is possible with JUnit (and various add on's). But, it is not only super powerful in Spock, it is also very terse, keeping your test code very neat and easy to read.

Suppose we want to Stub a class called PaymentCalculator in our test, more specifically one of its method, calculate(Product product, Integer count).   In the stubbed version we want to return the count multiplied by 10 irrespective of the value of product.   In Spock we achieve this by:
PaymentCalculator paymentCalculator = Stub(PaymentCalculator)
paymentCalculator.calculate(_, _) >> {p, c -> c * 10}
If you haven't realised how short and neat this is, well then get yourself a coffee.  If you have realised well you can still have a coffer but consider these points:
  1. The underscores in the calculate mean for all values 
  2. On the right hand side, of the second line, we see a Groovy Closure. For now, think of this as an anonymous method with two inputs. p for the product, c for count. We don't have to type them. That's just more boiler plate code gone. 
  3. The closure will always return the count time 10.  We don't need a return statement.  The value of the last expression is always returned. Again, this means less boiler plate code.  When stubbing becomes this easy and neat, it means you can really focus on the test - cool. 

Parameterised Tests

The best way to explain this is by example.
@Unroll
def "Check that the rugby player #player who has Irish status #isIrish plays for Ireland"(String player, Boolean isIrish) {
    given:"An instance of Rugby player validator"
    RugbyPlayerValidator rugbyPlayerValidator = new RugbyPlayerValidator()

    expect:
    rugbyPlayerValidator.isIrish(player)  == isIrish

    where:
    player               ||  isIrish
    "Johny Sexton"       ||  true
    "Stuart Hogg"        ||  false
    "Conor Murray"       ||  true
    "George North"       ||  false
    "Jack Nowell"        ||  true

}
In this parameterised test we see the following:
  1. The test is parameterised. The test signature having parameters tells use this, as do the where block.  
  2. There is one input parameter player and one output parameter - which corresponds to an expected value. 
  3. The test is parameterised five times.  The input parameters are on the left, output on the right. It is, of course, possible to have more of either, in this test we just have one of each. 
  4. The @Unroll annotation will mean that if the test fails, the values of all parameters will be outputted. The message will substitute the details of player into #player and the details of the Irish status substituted into #isIrish. So for example, "Checks that the rugby player Jack Nowell who has Irish status true plays for Ireland"
Again, this makes it much quicker to narrow in on bugs. Is the test wrong or is the code wrong? That becomes a question that can be answered faster.  In this case, of course it is the test that is wrong.

All the benefits of Groovy

What else? Well another major benefit is all the benefits of Groovy.  For example, if you are testing an API that returns JSON or XML, Groovy is brilliant for parsing XML and JSON. Suppose we have an API that returns information about sports players in XML format. The format varies, but only slightly, depending on the sport they play:
Joey Carberry Teddy Thomas
Lionel Messi Cristiano Ronaldo
We want to just invoke this API and then parse out the players irrespective of the sport. We can parse this polymorphically very simply in Groovy.
def rootNode = new XmlSlurper().parseText(xml)
def players = rootNode.'*'.Players.Player*.text()

Some key points:
  1. The power of dynamic typing is immediate. The expression can be dynamically invoked on the rootNode. No verbose, complex XPath expression needed.
  2. The '*', is like a wildcard. That will cover both RugbySummaryCategory and FootballSummaryCategory.
  3. The Player*, means for all Player elements. So no silly verbose for loop needed here 
  4. The text() expression just pulls out the values of the text between the respective Player elements. So why now have a list all players and can simple do:players.size() == 4. Remember, there is no need for the assert. 
Suppose we want to check the players names. Well in this case we don't care about order, so make more sense to convert the list to a Set and then check. Simple.
players as Set == ["Joey Carberry", "Teddy Thomas", "Lionel Messi", Cristiano Ranaldo"] as Set

This will convert both list to a Set which means then order checking is gone and it is just a Set comparison. There's a tonne more Groovy features we can take advantage of. But the beauty is, we don't actually have to. All Java code is also valid in a Groovy class. The same hold trues for Spock. This means there is no steep learner curve for anyone from a Java background. They can code pure Java and then get some Groovy tips from code reviews etc.

Powerful annotations

Spock also has a range of powerful annotations for your tests. Again, we see the power of Groovy here as we can pass a closure to these annotations. For example:
@IgnoreIf({System.getProperty("os.name").contains("windows")})
def "I'll run anywhere except windows"() {...}
Or just make your test fail if they take too long to execute
@Timeout(value = 100, unit=TimeUnit.MILLISECONDS)
def "I better be quick"() {...}
So in summary Spock versus vanilla JUnit has the following advantages:
  1. Test Structure enforced. No more random asserts. Assertions can only be in designated parts of the code. 
  2. Test code is much more readable. 
  3. Much more information on the context of the failed test
  4. Can mock and stub with much less code
  5. Can leverage a pile of Groovy features to make code much less verbose
  6. Very powerful test parameterisation which can be done very neatly
  7. A range of powerful annotations. 
And one of the often forgotten points is that your project doesn't have to be written in Groovy. You can keep it all in Java and leverage the static typing of Java for your production code and use the power and speed of Groovy for your test code.

Until the next time take care of yourselves.