Thursday, May 19, 2016

Adding a strongly typed eager loading feature to an ORM

When I selected the data access technology to create Invotes, I was primarily searching for an ORM that reminded me of Entity Framework.

There were a number of different SQL based frameworks with their own pros and cons. Here's a short list of a few of the more prominent libraries that I evaluated:

  • Slick - The most popular Scala FRM, also known for it's steep learning curve.
  • Anorm - Bundled with Play Framework, uses plain SQL to execute queries.
  • Squeryl - A strong typed ORM that focuses on type safety with familiar SQL DSL

I decided to give Squeryl a shot since it was easy to pick up and it looked fairly similar to Entity Framework.

A week or so into the project, I ran into some productivity and peformance concerns. Squeryl worked exactly as advertised, but I wasn't thrilled with the way I had to retrieve relations for a given entity query. Rather than making a single call with a join, related entities were loaded by making additional calls to the database. Consider this schema:

Database Tables

account
- id
- first_name
- last_name
- email
account_event
- id
- account_id
- event_id
event
- id
- name
- start_timestamp
- location

In Squeryl, the simplest way to retrieve an account with their events is by touching all the properties inside of a transaction block. This generates a query for each collection, which runs and executes the appropriate SQL and populates the relation:

(Note that if these relations are accessed outside of a transaction block, an exception will be thrown).

While this is a straightforward and effective approach, it's not always performant depending on the size of the dataset being operated on.

Imagine a scenario where there are hundreds of accounts with hundreds or even thousands of events. In this case, thousands of SQL executing queries would be created to populate the model.

Joins are the obvious solution to this issue. When I first started working with Squeryl, I found that this was possible, but very inconvenient, particularly when working to retain immutable data structures.

While it worked, it was very difficult code to read and reason about. When it was time to build a new query, I found that my drive to continue drastically decreased.

My first thought was to build out an external library with helper functions, which would allow these joins to take place in a generic way. As I attempted to tackle the problem, it became apparent that this wasn't going to be a straightforward task, plus it felt like a waste to put in a large amount of effort on a problem that would be better served inside the Squeryl library itself.

The Inspiration: Entity Framework

My experience with Entity Framework reminded me that there are pre-established, elegant ways of dealing with this problem. EF comes with an eager loading feature built in:

In this example, the Include method is used to load related entities. This instructs the query that it should select Accounts with a matching username, and it should also select and materialize the AccountEvent and Event relations.

Inspired by a similar design and with no former experience in this area, I decided to try my hand at implementing a similar feature in Squeryl.

The Design

On the surface, the idea is fairly simple. Each selected entity needs to be joined to the relation that's specified to the left of it.

Jumping right in, I soon realized that entering an unfamiliar code base meant that I would be best served by breaking down the feature into more manageable chunks.

Step 1: Expose existing relational information in Squeryl

To generate a LEFT JOIN clause between two entities, there needs to be a way to retrieve the foreign key columns, which are used to build the ON clause.

In Squeryl, this relational information is defined up front (by you) in an object derived from the Schema base class:

The information existed inside this class, but it was not exposed. A new public method had to be created to retrieve the relation between two entity classes.

Looking through the Schema base class, there was already a method that looks similar called findAllTablesFor[A](c: Class[A]). Instead of finding a table for a class, the new method needed to find the foreign key relation given two classes. Luckily, this information was already stored in a private field (_oneToManyRelations) that tracked all one to many relations. The method is implemented as follows:

[Final source code here]

Relations can now be found between two tables by calling this method on the Database object:

Step 2: Basic Join Generation

At some point, a Squeryl query has to generate a real SQL query. Before investing a lot of time on the API side of the include feature, I wanted to figure out how a join could be introduced using a newly implemented include method.

Before we continue

One thing that's important to note is the way that Squeryl handles relations on an entity level. To create a field reference to all AccountEvents on an Account model, a reference to the relation is used:

Further, to manually write a join in Squeryl, it would look like this:

This returns a collection of tuples (Account, AccountEvent), meaning the AccountEvent data will not be attached to the collection Account.accountEvents.

The way that Squeryl models this underneath the covers is by attaching what is called a subqueryable instance (generally an entity table) and an associated join expression to the QueryYield class. More on this later.

Getting a naive include expression to generate expected SQL

To create a proof of concept, I leveraged the field relation to write a quick and dirty include signature on the Squeryl QueryYield definition:

[Original Source Code Here]

This allowed me to write a simple test query to validate that the correct SQL was being generated until the real API was designed.

All the pieces were in place to figure out how to generate the expected join SQL:

  • Access to foreign key information for both sides of the join
  • An understanding of how Squeryl builds joined queries through subqueryables
  • Temporary include method signature

First, the information derived from the include parameter (the table entity and foreign key information) is stored with the query (in the QueryYield class), so it can be retrieved at execution time. This is the include implementation as seen in the above example.

[Original Source Code Here]

The important lines are:

  • includeExpressions - a new subqueryable is stored and will be included as a left outer join on the specified table entity.
  • joinExpressions - the relationship between the left and the right table is stored, which is the information that will be used to generate the ON clause.

As I mentioned earlier, Squeryl already had a code model for joins, so the information attached to the QueryYield class is utilized when building the query. This is done in the AbstractQuery class:

[Original Source Code Here]

At the top of the snippet, the subqueryables found in our include clause are being appended to the collection of items to be joined. Now the join clause can generate the correct SQL. The effect can be seen by inspecting the statement on a Squeryl query:

Step 3: The Include DSL

While the naive approach above works fine for simple scenarios, I wanted the feature to be capable of including an arbitrary amount of nesting.

Consider again the original example, to include all events for an account. Since account_event is the junction table, we need to be able support an arbitrary amount of nested relations. This was the initial idea for the syntax:

But I also wanted to support parent relations (many to one):

And adjacent relations:

And any arbitrary combination of the above. I realized quickly that I was looking at implementing a tree structure.

To my surprise, this turned out to be the hardest part of the entire feature.

The first difficulty was the lack of expression trees in Scala. In .NET, an expression can be created to model code in a tree structure. The tree can be traversed at runtime to extract more meaning from the code provided, rather than just the return value of the call. What this means in practical use is that the API could have been much simpler. For example, in .NET I could write this signature for the include method:

Then the method could be called with the following relation parameter value:

Inside the Include method, inspecting the value of relation would expose that we returned an Events relation by traversing from Account, to the AccountEvents relation and finally onto the Event relation. This is a concept that Scala lacks (although something like this may be possible with macros).

Instead, a more realistic approach was chosen to express relations in the include method. The idea was to create an enclosing type for each node of the expression, and to use a fluent style API to chain each relation together, keeping the expression strongly typed. The final DSL looks like this:

I'm not generally a big fan of symbols for method names, but when words were used instead of symbols, it obfuscated which relations were actually being selected. Making that information easily parseable by a human was of primary importance. Specifically, the methods used are

  • ->> (Select adjacent relations)
  • -* (Select the many side of a relation)
  • *- (Select the one side of a relation)

Below is an abridged version of the PathBuilder base class, which highlights how this fluent chaining is accomplished:

[Final Source Code Here]

Think about an instance of PathBuilder[OneToMany[Account]], where P is OneToMany[Account]. Observing the one to many method (-*), it accepts a function with P (OneToMany[Account]) as a parameter and a result of OneToMany[A]. This means you could pass something like include(pathBuilder => pathBuilder.-*(a => a.accountEvents)) and it would return a PathBuilder[A], which would be PathBuilder[OneToMany[AccountEvents]]. This is how static typing is preserved, while mitigating the absence of expression trees baked into the language.

This allows for infinite flexibility, you can include as many relations as needed while leveraging the type safety Scala has to offer.

Developing this tree style syntax took a lot of trial and error. I would fix one piece, and find out that adjacent relations weren't implemented properly, then I would fix that issue and parent relations were broken. It was tedious, but with enough perseverance and a suite of valuable tests, all pieces ended up working properly. The final syntax is arguably ugly, but ultimately flexible.

On top of building this DSL, the AbstractQuery class was updated to recursively build out queries with the correct joins.

[Final Source Code Here]

Step 4: Materializing row data

It may seem that the feature would be complete at this point, but there's still one critical missing piece.

At this stage, the query was a single statement with the appropriate join clauses, but the relational row data still needed to be materialized and populated in the parent instances.

The first attempt

My first attempt to materialize the dataset into class instances was brute forced and honestly somewhat embarrassing. But for a first step, getting the correct result is more important than the most efficient result, so at least I was able to write some valuable tests to validate my work.

In simple terms, the algorithm was:

  1. Materialize all objects in each row of the data set.
  2. Loop over each object from parent to child, and populate each relation with the appropriate children.
  3. Call distinct on each relation to ensure uniqueness

This is less than ideal, but it worked and passed every test case I could think of (including all tests in the Invotes test suite).

But eventually I hit a performance issue. Relatively simple queries were taking seconds to return.

It occurred to me that there's no reason why every row would need to materialize multiple instances, especially when you think about the way SQL returns joined data:

# account.id account.username account_event.id account_event.account_id
1 100 user1 200 100
2 100 user1 201 100
3 100 user1 202 100
4 101 user2 203 101
5 101 user2 204 101

Clearly it's a waste of resources to materialize five Account classes (the number of rows) when only two are needed (the number of unique IDs). Ideally, each account_event row would be materialized, and the results would then be attached to existing account instances (if it they were already materialized) rather than regenerating an account instance for each row.

I decided a better algorithm was neccessary - one that makes better use of the tree structure already in place.

  1. Traverse the tree until an end node is reached (the furthest child)
  2. Read the primary key
  3. Search for a materialized instance in a hashtable with the given key
  4. If not found, materialize the object and store it, otherwise return the existing object
  5. Append the materialized object to the parent relation

This process repeats recursively until all objects have been materialized.

The result is much better - very little time is spent materializing rows...and it turned out that the biggest (but not only) culprit causing performance issues in my application was simply a missing index. Whoops.

A minor optimization

There's one other optimization I made, but I only implemented it when the include path is moving in the one-to-many direction in the tree. I never implemented the opposite direction since I later found out that the change was trivial in the grand scheme of things.

Look at the previous example SQL data set, and notice how the account id value is repeated - once for the account table, and once for the account_event table, as these are the foreign key columns that link the tables. Now imagine working with a dataset that's several more relations deep. If the account_id column from the account_event table has already been materialized (meaning it is found in the hashtable), then only the account_event data needs to be materialized and attached, and there's no more work to do. No more of the row data needs to be read or materialized from data on the left.

Key Takeaways

TDD is still extremely valuable to me

I know that this is often turns into a hotly debated subject, but I can't emphasize how much writing tests cases before I write code helps me efficiently get to the desired result. It keeps me focused on what features are important, and allows me to design code from the consumer perspective before I start writing any logic.

It gives me the ability to freely experiment. If the test passes, I'm pretty confident that the conceptual model I have in my head is correct. If it fails, I get a stack trace or a helpful failure message that clues me in on what I did wrong.

Plus, since I'm going to have to write the test eventually, and the alternative is using a REPL or some other more temporary test harness - why wouldn't I just put it in a test class first? It feels less wasteful to me.

One thing I should mention is that I don't believe TDD is only for unit testing, or that you have to write isolated unit tests (with mocks) if you're practicing TDD. I believe that testing is most important at code boundaries, and more robust if you minimize the amount of mocking needed. In this case, that just meant writing tests for public code that could be accessed outside of the library, and no mocks were needed.

Optimizations matter in libraries

This is obvious, but I can count few times I truly needed to optimize anything in application code, because generally the libraries I depend on have already done the hard work. It was kind of fun to run into an issue where I really did need to speed up an algorithm since the payoff was noticeable and has the potential to affect others.

Reflection in Scala is not ideal

.NET easily beats out Scala from a reflection standpoint. The lack of reified generics and expression trees was less than ideal. I recognize that there are limitations to the JVM, but it's still a point in .NET's favor.

On the positive side, it's certainly workable and I was able to accomplish everything despite missing these features.

Open Source is awesome

Huge shoutout to everyone who's worked on Squeryl - it's a great library and I'm always in awe of the sheer amount of free and open software available. Cases like this really highlight why having the ability to modify the source code benefits everyone. I get to build what I need rather than begging a volunteer for free work, and I can share it with others if they find what I built to be useful.

The Code

You can find most of the detailed commits in the original include branch on Github. To see the final diff, see the include2 branch where the commits were squashed.

The code has not yet been merged into the official Squeryl master branch at this time, but I've been using this in several personal projects and I haven't encountered any issues so far. Since this is the first ORM feature I've ever built, I'd be happy to hear any constructive feedback on where I could improve things. Thanks for reading!

Tuesday, September 29, 2015

Scala for the C# Web Developer: Part 4 - Development Environment

Previously in this series, I introduced similarities and differences in language features between C# and Scala:

In this part, I'm going to show you how to set up your development environment so we can begin to write a web application.

Play Framework ~ ASP.NET MVC

There are several frameworks you could use to build a web application in Scala, but I chose Typesafe's Play Framework since it's very similar to ASP.NET MVC and it has good IDE support. The page syntax is similar to Razor and the conventions feel familiar.

Play Framework Installation

In the .NET world, you'd simply use the File->New Project menu item in Visual Studio to create a new project. Since the Java/Scala ecosystem isn't centrally controlled by a single authority (as Microsoft is to .NET), the process isn't quite as straightforward. Specifically for Play Framework, you'll use a command line tool to create the necessary assets for a new project.

  • Download the Java 8 JDK (and make sure the bin folder is appended to your PATH environment variable)
  • Download the latest version of Play Framework from the home page. For this post I'll be using 2.4.3.
  • Unzip the Play archive to somewhere convenient, such as your home directory (I put it in C:\Users\Name\activator).

Create a new project

  • Open a Command Prompt and navigate to your activator directory (note that this is not the .activator directory - the . indicates hidden file in *nix systems, and this directory is used for framework assets)
  • Create a new project:
  • After the dependencies download and you are prompted, indicate you want a basic Scala project:
  • Type a name for your project:

A new project should now be created for you at C:\Users\Name\activator\play-intro.

Using Typesafe Activator

You don't actually need an IDE to do any development with Play Framework. You could use any text editor alongside activator to accomplish all essential tasks. Of course, coming from Visual Studio, that probably doesn't sound enticing to you, but let's demonstrate how things work with activator by compiling the project.

  • Navigate to your project directory:
  • Start activator (this will probably take a long time for the first run; this is downloading installing everything you need to build and run your project):
  • Compile the project (this also may be slow the first time):
  • You should see a success message if everything went well.

If you've heard rumors about the speed of Scala's compiler in the past, you may be thinking that this process took far too much time. You're right in a sense – the first time you run through these steps, it will take a while since SBT (the underlying build system activator is based off of) has to grab every dependency for the build system and the project.

However, next time you run these steps, you'll see that each subsequent command is actually quite speedy. The slowness is not caused by the compiler, activator/sbt is just downloading and configuring all of the dependencies in the build chain on the first run.

Running Locally

Now you can run your project to access your website locally:

Open a browser and navigate to http://localhost:9000. You should see a webpage with the heading Your new application is ready. This is your website.

Configuring an IDE

We'll be using the Typesafe Scala IDE (based on Eclipse) for our development, simply because it's free. I'm actually not particularly familiar with this IDE, since I usually use IntelliJ IDEA, but the cost of IntelliJ could be a barrier to entry for some developers. If you have a copy of IntelliJ available, I would highly suggest using that instead, as it provides an experience much more similar to what you would expect with Visual Studio.

Installing the IDE

  • Download and extract the Typesafe Scala IDE to a location of your choosing. One nice thing about Eclipse is that it's fully self contained - you won't need to run an installer.
  • In order to get your new project working smoothly with Eclipse, you'll have to install a plugin for SBT. This can be done by navigating to the .sbt directory in your home folder C:\Users\Name\.sbt\0.13
  • Create a new folder at C:\Users\Name\.sbt\0.13\plugins
  • Create a new text file at C:\Users\Name\.sbt\0.13\plugins\plugins.sbt
  • Add addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "4.0.0") as the first line in C:\Users\Name\.sbt\0.13\plugins\plugins.sbt
  • If you still have your activator console open (the last line in your console window starts with [play-intro] $), then reload the project:
  • If you closed your activator session, open a new console window and navigate to your project directory and open it again:
  • Create the Eclipse project files:
  • Open Eclipse. Go to File->Import->General->Existing Projects into Workspace. Browse to locate your project at C:\Users\Name\activator\play-intro and click Finish

You should now see a folder with your project name in the Package Explorer.

Setting up a Command Prompt in Eclipse

While you most certainly can set up Eclipse to run your project with a push of a button (like the Run button/F5 in Visual Studio), it's not as straightforward, and probably not worth it when working in Scala. IntelliJ has much better support for this, but because the compile times suffer, I tend to always just use an activator/sbt session in the command prompt when I'm working. It's slightly inconvenient, but since Scala supports hot code reloads and Eclipse continually checks for errors, you may not miss the push button convenience of Visual Studio anyway.

IntelliJ has Command Prompt/Terminal/Shell functionality baked in, but I was surprised to find that Eclipse does not. However, it's quite simple to add support, as found in this StackOverflow post.

  • On the top toolbar (just below File/Edit/Refactor/etc...), there should be an icon of a play button with a little red toolbox. Click on the drop down arrow and select External Tools Configurations
  • On the left side of the window, right click on Programs and select New
  • On the right side of the window, enter:
    • Name: Command Prompt
    • Location: ${env_var:ComSpec}
    • Working Directory: ${project_loc}
    • Arguments: -i
  • Click Run

You should now see a familiar command prompt in the console tab on the bottom window. If the command prompt does not show your project directory C:\Users\Name\activator\play-intro, then either:

  • change directories (using the cd command) cd C:\Users\Name\activator\play-intro, or
  • close the console tab. Click on your project folder in the Package Explorer, then click on the Run icon arrow again, and select 1 Terminal. This should start you in your project directory.

Type activator in the Console to get started.

Play Framework Project Structure

This should feel a bit familiar, although the structure is certainly a bit different than MVC. This is a rough, high level comparison of the Play Framework project structure compared to an ASP.NET MVC project.

Path ASP.NET MVC Play Framework
app/ MVC Project Code Primary Code Directory
app/controllers Controllers Controllers
app/views Views Views
conf/application.conf Web.config/Settings Application Configuration
conf/routes RouteConfig/RegisterRoutes Route Configuration
test/ Test Project Directory Test Directory
build.sbt .SLN/.CSProj/(NuGet) packages.config Project Configuration
project/project.sbt MSBuild Extensions/Visual Studio Extensions Project-specific Plugins

Write Some Code

Ok, so we have our IDE set up, we know how to run the project, and we are somewhat familiar with the project structure – let's finally write some code.

Create the Model

  • Create a new package (folder) at app/models
  • Create a new class at app/models/Student.scala
  • We're going to make this a simple case class with id, firstName and lastName fields:

Create the View

Now that you have a model, we can display it on a page. Create a new view students.scala.html in the views package. Right click on the package name, then select New->Play Template and enter students.scala.html as the filename.

Create the content for the view. This should look pretty familiar, as it shares a lot of similarities with Razor syntax.

Unfortunately, it looks like the Eclipse Scala IDE has a bug, so you aren't going to be able to verify if your view code is valid until runtime. This should be fixed soon.

Create the Controller Method

Open your Application controller (by default there's just this one, but you can create more at any time). Create a method called getStudents and pass a list of Students to your view.

Create the Route

Routes are quite a bit different in Play Framework, but fortunately are extremely simple. Instead of defining a rule by convention, you simply define each specific route in the routes file.

Place a new line underneath the default route, and point it to your getStudents method.

Run

Go to your console window and start activator if you haven't already. Then type run to start your development server. Navigate to http://localhost:9000/students, and you should see a table with the student data you passed into your controller method.

Update your code

Recall that I mentioned Play Framework supports hot code reloads. This means that you don't need to stop, compile and restart your server between changes.

Try adding another student to your controller method and save the file.

Refresh the page, and you should now see the new student without compiling. It's really handy to have the flexibility of a scripting language while still preserving the advantages of static compile time guarantees.

Until Next Time

That should be enough to get you started, but we'll talk about more advance IDE and project features next time in Part 5: Testing and Debugging.

Tuesday, September 22, 2015

Scala for the C# Web Developer: Part 3 - Common Data Types and Pattern Matching

In Part 1, we discussed some feature similarities that Scala and C# have in common to gain some familiarity with the language. Part 2 introduced both new and familiar language features that may or may not be familiar to C# developers.

In this part, we're going to talk about a couple of commonly used data types (the List and Option types), as well as pattern matching (a better, more readable alternative to an if/else statement).

Lists

The List is one of the more familiar data types in Scala. The syntax is slightly different, but overall the API is very similar to .NET.

Instantiation

You can create a populated list in one line:

Note that you do not need to declare the list type - it will be inferred by the items that are added. This is the equivalent of declaring the arguments explicitly:

Filter - C# Where()

Just like in C#, you can filter a list by any criteria:

Head - C# First()

Often you'll just want to grab the first item in a list:

In C#, it's pretty common to grab the first item that matches a specific criteria. In Scala, there isn't a function specifically for this purpose. Instead you'd just chain filter with the head function.

A few more mappings

I'm not going to go any further here - there hasn't been a function in C# that I haven't been able to find an equivalent for in Scala. Instead, here are a few more mappings, which I'm sure you'll be able to figure out without much hassle:

C# Scala
Count() count() or length
Select() map()
SelectMany() flatMap()
Skip() drop()
Take() take()
ForEach() forEach()
Any() nonEmpty or exists()
All() forall()
OrderBy() sortBy()

Full reference can be found in the Scala documentation. Note that you may want to check out Seq, Iterable, and any other classes in the inheritance chain for more restricted collection implementations. A List isn't always the right structure, but it's a good place to start.

Option - Better C# Nullable

At first, I thought that this would be a new concept, but then I remembered that we have a pretty comparable concept (Nullable<T>) in C#. The big difference is that the Option type works on all instances in Scala, and they should always be used in situations where a value would traditionally return null. This allows you to avoid the dreaded NullReferenceException/NullPointerException by prompting you to deal with both null and non-null cases when attempting to access the value.

You can create an optional value by wrapping it:

You can also wrap a null value:

Of course, these examples aren't very realistic, since you wouldn't typically wrap a value like this if you already know it's null or not. More realistically, you would wrap a value when you're calling a code that could return null (typically Java library calls – Scala libraries shouldn't return null):

Usage

Now that you have an Option instance, you'll likely want to transform it into something usable. If we were using .NET conventions, we would use an if/else statement:

Map/GetOrElse

In Scala, there are several ways to accomplish the same task. Most commonly, I use map in conjunction with the getOrElse method:

This example may be easier to conceptualize if you think of an Option instance as a list that is restricted to holding a maximum of one item. We're mapping a result, but that code will only be executed if there is an item in the list. Then getOrElse will do the inverse - it will only be executed if the Option instance is empty. We then surround the result in a print statement, and since both paths return a string, we'll get the same output as the C# styled example.

Also note that the map method is allowing us to transform the underlying Option instance from an Employee to a String, just as you can do with the Select method in .NET.

Fold

The fold method is actually even better suited to this particular task, although I don't find myself using it quite as often:

The first set of parenthesis takes one argument, which is the result if the value is empty. The second set handles non-null values, and passes the underlying value via lambda.

Pattern Matching - Better Switch Statements

In C#, I rarely ever use a switch statement, mainly because their usage is restricted to integral types only. If you want to do anything remotely complex, you'd typically have to use any number of if statements.

Pattern matching in Scala is what switch statements wish they could be. You can match on a variety of types and values.

Match by Class

Since we just talked about the Option type, we can start there. Pattern matching is yet another way the above print statement could be written:

Note that the above expression could instead be wrapped with the print() method as in previous examples - recall that everything returns a value in Scala.

Some and None are the underlying case classes that are returned when you wrap a value with Option(...). In the above example, we're asking it to match on whichever type was returned and to do something in each case.

  • If the value is non empty (an instance of Some), name the underlying value e, then print the full name.
  • If the value is empty (an instance of None), print that no employee was found.

Narrowing with if statements

You can further filter your matches with if statements:

In the above example, we're checking to see if our list is a certain length. If it's not, we can use the fall through case _ to handle a scenario we're not specifically checking for. Alternatively, you can use any variable name if you'd like to reference the value in your case statement body. It's important to include this if you have not exhaustively checked all scenarios, or else your code may throw an exception.

Case Classes in Pattern Matching

Recall that I mentioned Some and None are case classes. There is another special attribute of case classes that I didn't mention previously – their constructor arguments can be used in pattern matching. If we use our Person case class from above, we could combine the concepts from our previous examples to check if a Person is over the age of 21:

Advanced Usages

There are many more advanced usages that I won't be covering here. The most important thing to remember is that certain types will pass back underlying implementations that you can pattern match on, such as Option returning Some or None (another one that comes to mind is Either, which can return Left or Right). Keep an eye out for this when working in Scala, and you should be able to piece together the intended design of the data type.

Until next time

That wraps up the language comparison for now. While Scala does provide many, many more features and data constructs, these are the most common concepts that I've encountered while creating web projects, and hopefully will be enough to get you comfortable writing some code. Next time, we'll discuss how to create a web project in Part 4: Development Environment.

Tuesday, September 15, 2015

Scala for the C# Web Developer: Part 2 - More Scala Concepts

In Part 1, we discussed why you may be interested in trying out Scala, and covered some basic concepts that Scala has in common with C#.

In this post, I'm going to attempt to explain concepts that are a bit more advanced. Some have .NET equivalents, others don't. Don't sweat it if you don't fully grasp some of the new Scala only concepts, these will be reinforced in future posts.

Immutable by default


This can be a fairly jarring paradigm shift, but Scala is immutable by default. Instead of creating variables that can be reassigned at any time, you typically create variables that are equivalent to using the const keyword in C#:

This means you will not be able to reassign this variable in the future. You would either have to create a new list (preferred) or use the var keyword.

This concept extends further, however. Because this is a design goal, the standard Scala library does not provide mutable collections by default. For example, if you wanted to use a list and append an item, you would find that there is no append method. Instead, you could use the +: method to create a copy of that list with the item appended to the end.

There is a set of mutable collections that are provided, as the example shows, but this is not usually the desired solution unless you need the performance.

Traits can have bodies


Traits are Scala's answer to interfaces, which means you can implement multiple traits on a class. One major difference is that they can actually include method bodies, including methods with non-public visibility. You can almost think of them like an abstract class, except they don't have a constructor.


Importing Objects (Similar to using static in C# 6.0)


A new feature in C# 6.0 is the ability to use static methods without qualifying them with the class name by utilizing the using static keyword:

Similarly, in Scala, you can import all methods from an object (Scala's version of a static class by using a singleton).

The difference is that Scala uses an underscore after the object name to indicate that you want to use all methods.


Implicit Parameters


Coming from the .NET world, dependency injection is the best use case that made this concept click for me. You can think of implicit parameters as Scala's baked in answer to an IoC container.

In .NET, you'd tend to pass dependencies to your constructor to mock them out, change the implementation, and hide the implementation details from the caller. For example, you may have a repository class in .NET:

This is nicely encapsulated so the caller has no idea how the Person instance is persisted. It also allows mocking through constructor injection. Typically in the .NET world, you'd use a DI framework like Castle Windsor, Autofac or Ninject to manage all of your dependencies. You could do it manually, but it quickly becomes very painful, especially when refactoring.

In Scala, you can use implicit parameters and objects to achieve the same effect as an IoC container. By importing all implicit parameters in an object, you can instantiate a class without explicitly providing constructor parameters:

The complier injects the dependencies automatically (note that it matches on the instance type *not* the variable names), which would look like this if you were to handle the injection manually:

What's the advantage of using implicit assignment? It's essentially the same as using a DI container - when you refactor, you don't have to change nearly as much code. If you reorder parameters or remove them all together, your code will still compile. One nice difference is when you add a parameter that is not implicitly available in scope, the compiler will throw an error. IoC containers in .NET often don't catch this mistake until runtime.

One disadvantage to constructor injection is that it assumes each method will actually make use of all the dependencies passed in, which is rarely the case. Scala allows you to use implicit parameters on methods as well, so you can be sure you created a dependency because you're actually going to use it:

Note that none of the calling code had to change after refactoring the PersonRepository. This refactoring is nice because it allows you to call the getCached method without providing a SqlConnection instance - this simplifies the call and reduces the resources used when calling only the getCached method.

Because you can always explicitly provide a parameter, you get all the benefits of DI without needing to use a framework. By using implicit parameters, you can safely refactor and test your objects, traits and classes with minimal effort.

Extension Methods


I'm sure these are familiar to .NET devs. Scala allows you to define similar extensions by using implicit classes inside of an object.


Reducing Verbosity


Scala tries to be more concise by creating some shortcuts in the syntax. I personally found most of these things to be quite logical once I understood the concept, but they may not be completely straightforward the first time you see them.

Case Classes


You can almost think of these as POCO classes in .NET - although just like in .NET, they are capable of having methods.

There are three major differences between a case class and a regular class. First, you do not have to use the new keyword to instantiate them. You can just use the class name:

Secondly, the parameter arguments are automatically exposed as public fields:

Lastly, the equality operator works by comparing the fields themselves, rather than the object reference:


Tuples


Tuples are available in .NET, but aren't overwhelmingly popular. Probably for good reason, while they are flexible and capable of carrying related pieces of information without declaring a class, they make code more difficult to read, .

Scala has tuples baked into the language, you can return a tuple by simply surrounding your instances in parenthesis:

It's certainly a nice feature to have available, but code can become nearly unreadable if they are overused. If you use a tuple and run into any readability issues, try using a case class instead.

Symbols for Method Names


Scala was created to be an extensible language, and as such, it allows developers a large amount of freedom in naming. As a result, method names can be entirely symbolic:

Note how this can make code fairly tricky to read. Keep in mind that if you see a symbol you don't recognize, it very well may be part of a custom DSL, either imported from an object (like the example above) or inherited from a trait. Using your IDE to go to the method declaration is a really handy way to figure out what's going on.

Optional Parenthesis


This is slightly tricky because of the way Scala treats the order of operations, but parenthesis are not always required and can instead be substituted with spaces:

This is another piece of the puzzle when reading Scala code. Keep in mind that spaces and symbols can make nice looking DSLs, but they can also be highly confusing when first learning.

Until next time


That's certainly not everything, but if you understand most of these concepts, you can hopefully start looking at some code examples without being thoroughly confused (as I was). Next time we'll talk about Common Data Types in Part 3. Thanks for reading!

Friday, September 4, 2015

Scala for the C# Web Developer: Part 1 - Overview

Why bother?


While I've always been quite happy with the .NET ecosystem in general, there's always been a tiny voice in the back of my mind that wasn't comfortable letting Microsoft rule the destiny of my software projects.

Not that I don't trust them, but businesses need to make business decisions, and Microsoft's licensing decisions haven't always aligned with the small time/hobby developer. This isn't to say that they haven't changed immensely over the past few years (they have), or that they won't continue to make more open decisions in the future (they will).

I'm not here to persuade you into changing platforms, simply offering an alternative to those who are looking for one like I was. Personally, I just like the idea of being free from both costs and restrictions with technologies that I can use today. And if there's something out there that can make me just as productive, but it's free and open source, why not? I decided I'd never know unless I try.

Some of the nagging thoughts/issues I've had in the past include:

  • What if my SQL Database outgrows 10 GB?
  • Can I set up a test environment without worrying about licensing?
  • If I need to spin up another instance, do I need more licenses?
  • Wow, does a Windows install really require over 20 GB?
  • How in the world can I automate [X]?

A lot of this has changed now that the cloud/Azure takes care of the licensing, provisioning and storage issues for you, but I still prefer the flexibility of being able to deploy just about anywhere.

Why C#/.NET?


There are a lot of good reasons that I stayed on the .NET stack for this long:

  • IDE - Visual Studio is by far the best development experience on the market (IMO, of course). Even better now that it's free.
  • Language - I like languages with a lot of compile time guarantees. The quicker I can find errors, the better. I also like the functional features (lambdas specifically) that come with C#.
  • Toolbelt - I've been a .NET developer for so long that all the major hurdles (deployment, artifact repository, tools, quirks, etc) are a known quantity.
  • Familiarity - I know the style, commonly used patterns, and the most popular libraries to get things done.

I'm a web developer, so specifically this means I was looking to replace these technologies:

  • An OO/functional language with static type guarantees
  • Visual Studio
  • ASP.NET MVC
  • Entity Framework
  • SQL Server
  • SQL Database Project (.dbproj)
  • NuGet

Keeping these values in mind, I searched for a suitable open source full stack replacement.

Why Scala?


As I quickly found out, there aren't a ton of statically typed, managed languages that mix functional and OO paradigms with relatively large ecosystems that are ready for production. Scala felt the most familiar out of everything I looked at, and there are a lot of big name companies that have deployed it in production. I'm more about trying something than spending time reading up on every facet of the language up front, so I decided to give it a shot and let it succeed or fail based on my experience.

At this point, I'm guessing you've probably at least heard of Scala, but you may not know much about it. Let's start with some comparisons to get you familiar with the techniques you're already using.

Note that I'll be writing these examples with a C# style to start out with, and then show you the common Scala style at the end.

Basic C# to Scala Examples


Scala is a statically typed language that runs on the JVM. On first glance, it kind of looks like a cross between Ruby and C#:

Note that the biggest difference is that the default constructor is declared with the class name. If you want to do any work during the construction of this class, you'd do that directly in the class body (similar to Javascript):


Inheritance

It supports class inheritance, just like C#. Note that the call to base is actually set directly on the base class declaration, since this is the default constructor:


Interfaces

You can declare an interface by using traits:


Static Classes

One interesting feature of Scala is that there are no static classes. Instead, you would use an object, which is instantiated automatically at runtime as a singleton:

Objects can be accessed from anywhere by calling them by name (similar to a static class, you do not need to instantiate it):


Properties

You can also create properties with getters and setters:

Note that we don't actually need parenthesis on method declarations. This is up to you to determine each time you create a new member. You can think of omitting the parenthesis as if you are creating a getter property, and adding parenthesis as if it were a normal method. Very similar to C#.

Style


No Return Keyword

Another important thing to note is that everything evaluates to a value in Scala. This basically means that everything has to return something - there is no void (returning a Unit is the equivalent of void). Since each method you write has to return something, it's redundant to write the "return" keyword at the bottom of each method.

No Semicolons

Like Ruby, Javascript and a myriad of other languages, Scala has semicolons but they are rarely used.

Omitting brackets

Since everything evaluates to a value, you actually don't need brackets for one-liner functions.

String Interpolation

Similar to Ruby and C# 6.0, Scala has a really nice string interpolation feature:


Updating our Style

Tying this all together, we can update the Person example to match the conventional Scala style:


Of course there's tons more to learn, but we'll get to that as we move onto Part 2: More Scala Concepts. Thanks for reading!