Using Scala for something I’d normally using a “scripting” language for

From time to time, some repetitive task comes up that I can do quicker by writing a script to do it than to do it manually. Especially if it’s something I may be needing to do again in the future.

Usually I’d turn to a “scripting” language like Python, Groovy, or back in the day, Perl for this type of thing.

Today such a need came up, and I decided to try tackling it with Scala since it has many of the features that make the above dynamic languages good for this:

  • first-class support for map and list data structures
  • an interactive shell
  • minimal overhead to write a program, compile, and run it
  • support for functional programming
  • good regular expression support

The problem:
A small performance test program ran a large number of tests, and measured the elapsed time for each execution. The program output a line like “request completed in 10451 msecs” for each test. I needed to parse the output, collect the elapsed time measurements, and get some basic statistics on them; simple average, minimum, and maximum.

I used a Scala 2.8 snapshot, and fleshed out the code using the Scala interactive shell. First, define a value with the raw output to be processed:

scala> val rawData = """request completed in 10288 msecs
     | request completed in 10321 msecs
     | request completed in 10347 msecs
     | request completed in 10451 msecs
     | request completed in 10953 msecs
     | request completed in 11122 msecs
... hundreds of lines ...
     | request completed in 11672 msecs"""

The above uses Scala’s support for multi-line string literals.

The next thing I needed to do was parse the above output, using a regular expression to extract just the milliseconds. There’s several ways to create a regular expression in Scala. This is the one I like:

val ReqCompletedRE = """\s*request completed in (\d+) msecs"""r

There’s a bit of magic in how the above string literal actually ends up becoming a regular expression. There’s an implicit conversion in the Scala Predef object which turns a Java String into a RichString. RichString provides a ‘r’ method that returns a regular expression object. The members of the Predef object are automatically imported into every Scala module, so the Scala compiler will attempt to apply any conversions it finds in Predef when trying to resolve the ‘r’ method. So the above expression is creating a RichString from a String via an implicit conversion, then calling the ‘r’ method on it, which returns the regular expression.

To apply the regular expression to a line of the output and to extract the milliseconds, we can use an expression like:

scala> val ReqCompletedRE(msecs) = " request completed in 10451 msecs"
msecs: String = 10451

msecs gets bound to the first group in the regular expression (the part that matches (\d+)). This takes place via the Scala extractors feature – the scala regular expression class defines an extractor which extracts the grouping results.

The next step is to iterate over the lines of the output, extract the milliseconds, and turn the results into a list.

scala> val msecsVals = rawData.lines.map { line => val ReqCompletedRE(msecs) = line; Integer.parseInt(msecs);} toList
res11: List[Int] = List(10288, 10321, 10347, 10451, 10953, 11122, ..., 11672)

The above code is using the RichString lines, Iterator.map method, along with Scala closures.

Finally, to get the simple statistics:

scala> (msecsVals.sum / msecsVals.length, msecsVals.min, msecsVals.max)
res21: (Int, Int, Int) = (10736,10288,11672)

Putting the whole script together:

val ReqCompletedRE = """\s*request completed in (\d+) msecs"""r
val msecsVals = rawData.lines.map { line => val ReqCompletedRE(msecs) = line; Integer.parseInt(msecs);} toList
(msecsVals.sum / msecsVals.length, msecsVals.min, msecsVals.max)
Posted in Scala | 2 Comments

New change management tool: Redmine

I was starting a new project, needed a change management tool, and wasn’t really satisfied with the tools available last time I looked, so I took a look at what was currently available.  One option I came across that hadn’t been on my radar before was Redmine.

Redmine has many of the features I like from Trac, such as the Roadmap view, version control integration, an Activity view, and built in Wiki.  It also has a bunch of plugins, one of which looks like it’ll generate burndown charts, which I’m planning to use.

It looked like Redmine was fairly well supported on Windows (box I have available) so I decided to try it out.  Redmine is based on Ruby on Rails, so the first step to installation is to get Ruby and Rails installed.

I started off trying to install Ruby and Rails under cygwin, but ran into problems with building mysql from source (no cygwin port for this!).

So instead I went down the route of installing everything outside of cygwin.  Here’s a summary of the steps:

  • Install the Windows Ruby binary, along with Gem package system.
  • Install the Rails 2.1.2 gem, which Redmine 0.8.X depends on
  • Download and unpack the Redmine distribution
  • Setup the database (running on MySQL in my case) and edit database.yml per instructions.
  • I had to install older MySQL client library due to some Rails/MySQL library incompatibility (described here)
  • Create the database schema for Redmine via:
    rake redmine:load_default_data RAILS_ENV="production"
  • Populate a default configuration via:
    rake redmine:load_default_data RAILS_ENV="production"
  • You can then run Redmine via:
    ruby script/server webrick -e production
Posted in Software Development, Tools | Leave a comment

Scala, Maven and IntelliJ

I’m evaluating Scala for a project I’ve recently started.  The project requires processing a large number of data structures.  Between Scala’s good support for parallelism and functional programming, it seems like a good fit.  I’ve heard Scala brings many of the development efficiencies and expressiveness of dynamic languages, while being statically typed.  It does this through a number of features, including smart type inferencing, great functional programming support, traits, and a bunch of other tricks I’m in the process of learning.

We’ve seen several dynamically typed languages gain widespread adoption in the last eight years or so… Python, Ruby, Groovy, and you can go back to Javascript, Perl, PHP, etc..  What’s the last statically typed language that’s gained a wide following?  C#?  I’ve pretty much come to assume that any new language that was highly expressive, concise, and good for DSLs, would be a dynamic language.  Dynamic languages do have their drawbacks..  performance is usually not up to par with statically typed languages (although it rarely makes a difference), and the lack of type information at compile type make it very difficult for IDEs to provide features such as autocomplete, symbol lookup, and early error detection.  So if languages such as Scala manage to deliver a highly development-efficient language while retaining static typing, that’s all the better in my eyes.

Here’s what I did to get a minimal project set up in Maven and IntelliJ IDEA:

Create a shell of a project using a Maven archetype:

mvn org.apache.maven.plugins:maven-archetype-plugin:1.0-alpha-7:create \
-DarchetypeGroupId=org.scala-tools.archetypes \
-DarchetypeArtifactId=scala-archetype-simple \
-DarchetypeVersion=1.1 \
-DremoteRepositories=http://scala-tools.org/repo-releases \
-DgroupId=com.rps -DartifactId=scalatest

This created a project with a single Scala class, test, and a Maven POM containing the minimal dependencies and plugins.

The version of Scala generated by the archetype isn’t the most current. Just update the version in the POM:

<scala.version>2.6.1</scala.version>

to 2.7.6, or whatever is the version you want.

I then imported the project into IntelliJ IDEA using the excellent IDEA Maven integration.  I had to add the Scala nature to the project in IDEA myself, but otherwise, everything is working..  Editing, unit tests, debugging, etc.

The IntelliJ IDEA editor support for autocomplete seems to work pretty well, but it doesn’t seem to do much detection of errors as you’re editing yet.

Posted in Scala | Tagged , | Leave a comment

Maven and skinny wars

Maven is a great build system, but sometimes limitations come up that can drive you crazy.  One such limitation is support for building “skinny war” files.  Granted, I don’t think it’s the most common configuration, but for certain types of applications, it’s the preferred way to package your application.

By default, war files are packaged with all of the libraries they use embedded within them.  This is fine, unless your application uses a large number of libraries, and you have more than one war file within your enterprise application archive (ear file).  In that case, you’re causing the JVM to load the same set of classes multiple times, once for each war, which can use up a significant amount of memory, and bloating the size of your ear file.

In a skinny war configuration, the libraries are packaged at a higher level, within the ear file itself, and the war files are made to reference the libraries packaged within the ear file within their MANIFEST.MF file.  The classes are loaded once, and can be used in multiple web applications bundled in the ear.

Unfortunately, this isn’t directly supported in Maven, although there are workarounds.  As this page states, “The Maven WAR and EAR Plugins do not directly support this mode of operation but we can fake it through some POM and configuration magic”.  The workaround is to list the jars in each war, but tell Maven to exclude it from the WEB-INF/lib and to add references to the jar in the MANIFEST.MF file.

The ear file Maven project then needs to list every library it will package as a dependency.  This means that common libraries will be listed in the ear project, and each war project that uses them, causing quite a bit of bloat in the Maven project files.

This wiki page has a good description of the situation, some alternate solutions, and requirements for a long term solution.

Ideally, I’d like to be able to take the same Maven war project file, and build for standalone deployment (fat war), or deployment within an ear project where the common libraries are included once in the ear file (skinny war), without having to change the Maven war project file.

For this to work, I think the ear project would have to control whether and which war project libraries are pushed up to the ear project.  The ear project file could explicitly list those libraries that should be bundled in the ear, or it could have an option where it calculates the common libraries across the contained war files, and automatically bundles them in the ear file.  As the link above indicates, the ear project would have to have the ability to rewrite the manifest entries of the contained war projects.

Technorati Tags:

Posted in Java, Tools | Tagged | Leave a comment

Public mashup and data feed APIs

I’m doing some research for a possible venture.  A part of this venture would fall into the mashup category – pulling information from other sites and services, and combining it in a unique way.  Following are my initial notes on some interesting APIs and information feeds:

Programmable Web is a great resource for those building mashups – it has a directory of APIs and Mashups that make use of those APIs.  The blog post “Top Twenty Open APIs and Mashup Resources for Web Developers” lists some of the most popular APIs.

Google Transit Feed Specification

  • Provides information on stops, routes, calendar, fares, frequencies, usage policies
  • Unfortunately, only available for a limited set of metro areas

Google Maps AJAX API:

  • AJAX API
  • Display map with road, satellite, hybrid, and terrain, by latitude & longitude
  • Can respond to user events within the map – e.g. clicking, zooming in/out, panning, etc
  • Can manipulate map via JavaScript – adding markers, changing location, zoom, etc
  • Can create various types of overlays on the map
  • Includes google street view

Google AJAX Feed API

  • Include any RSS or ATOM feed on your site, with various presentation options

OpenSocial

  • Common API for multiple social networking sites, including hi5, LinkedIn, MySpace, Netlog, Ning, orkut, and Yahoo.  Unfortunately Facebook is not one of them.
  • JavaScript APIs for client side and REST/RPC APIs for server side
  • Can be used to create apps that are embedded in social networking sites, as well as use user social information (user profile, friend lists, events) from social networks in your site.
  • OAuth is used to allow users to authorize use of their social network information

YouTube APIs

  • Integrate YouTube functionality into your site: video searches, upload videos, etc
  • Integrate video player into your site
  • Allow users to see/manage their favorites on your site
  • RSS/ATOM data feeds from YouTube

YELP APIs

  • REST APIs
  • Retrieve business info and reviews for a geographic region and business category.  This includes the business location, and their categories.
  • Retrieve neighborhood name and info by location
  • Retrieve reviews for a particular business
  • Retrieve business info and pictures
  • It looks like this isn’t a complete directory – only those businesses and features that have been rated are listed

Zillow APIs

  • “Neighborhood and city affordability statistics: Zillow Home Value Index, Zestimate distribution, median single family home and condo values, average tax rates, and percentage of flips.”
  • Demographic data at the city and neighborhood level – local market data, affordability, household income, average age, commute time, etc.
  • Home Valuation: “Search results list, Zestimate® home valuations, home valuation charts, comparable houses, and market trend charts.”
  • Property Details: “Property-level data, including historical sales price and year, taxes, beds/baths, etc.”
  • Lists of counties, cities, ZIP codes, and neighborhoods, as well as latitude and longitude data for these areas so you can put them on a map.
  • “boundaries for nearly 7,000 neighborhoods and 150 cities”  available via Creative Commons license.
  • License restricts site from charging money for Zillow data

Yahoo! Local Web Services

  • REST APIs
  • Provides access to public collections created with Yahoo! Local Collections
  • Can perform local searches based on location, radius, route, categories: returns results with location info, ratings, categories, etc
  • Yahoo Local search web site: http://local.yahoo.com/ – You can see some of the information that’s available

Walkscore API:

  • Get walk score by location
Posted in Software Development, Web | Leave a comment

JSR 303 & Hibernate validation framework

I was recently looking for a validation framework, and came across the work that has been done lately for JSR 303 (latest version of the spec here). JSR 303 defines a standard meta-data model and API for validation of JavaBeans/POJOs. Basically, it’s a standard way to describe constraints for Java POJOs, and an API to access those constraints.

From the JSR:

“Validating data is a common task that is copied in many different layers of an application, from the presentation tier to the persistentce layer. Many times the exact same validations will have to be implemented in each separate validation framework, proving time consuming and error-prone. To prevent having to re-implement these validations at each layer, many developers will bundle validations directly into their classes, cluttering them with copied validation code that is, in fact, meta-data about the class itself.”

I definetely agree – the validation metadata belongs to the domain class. This has been a hole in the Java space for quite a while. We’ve had validation frameworks such as Commons Validator (previously part of Struts) for many years, but we haven’t had something that could be used across layers in a widespread manner. If you look at typical web applications with XML schemas , a persistence framework, and Web UI, you can easily see where you can end up re-implementing the same constraints multiple times.

public class Address {
    @NotNull private String line1;
    private String line2;
    private String zip;
    private String state;
 
    @Length(max = 20)
    @NotNull
    private String country;

    @Range(min = -2, max = 50, message = "Floor out of range")
    public int floor;

        ...
}

As can be seen from the Hibernate Validator example above, the JSR allows you to specify the validation message as a part of the metadata. I think it’s good to have the option, but I prefer to define the validation messages externally. I don’t want to have to change my domain classes every time someone wants different wording on a validation message, and for those that need to support internationalization, you pretty much have no option but to define them externally.  From looking at the JSR, it looks like defining messages externally is also supported.

Like recent JSRs, I like that it supports annotations, but still supports overriding/extension via XML. I’m also glad this JSR works on Java SE. In the past too many JSRs were restricted to JEE.

Hibernate Validator 4 (currently beta 1) is the reference implementation. It being a Hibernate project, you can guess that Hibernate core would be able to use the constraints to generate table definitions, etc.. But what about other layers – I’d really like to see UI frameworks taking advantage of this, and generating browser-side validation JavaScript as well as enforcing the constraints server side.

Here’s what I was able to find in the JSF world:

  • MyFaces Extensions Validator
    Planned support for JSR 303, but not currently supported. See this page.
  • RichFaces BeanValidator
    RichFaces 3.2.2 supports constraints defined in Hibernate Validator. Presumably they’ll switch to JSR 303 in the future. Since Hibernate Validator 4 is the reference implementation, hopefully switching from Hibernate Validator 3.X to JSR 303 is little/no work.
Posted in Java | Tagged | 1 Comment

Grails security

I’m working an a Grails-based project which requires security, as pretty much every web application does. My high level requirements are:

  • Role-based access control
  • Database-based authentication (passwords stored in database)
  • Simple to use
  • Good documentation
  • Ability to model permissions for finer granularity authorization than role (nice to have)
  • Captcha support (nice to have)
  • OpenID support (nice to have)
  • Facebook Connect support (nice to have)

From browsing the list of Grails plugins, it looks like there are two that fit the bill, each based on well-established Java security frameworks. Here are my notes on each:

JSecurity plugin

  • Based on JSecurity framework (now Apache Ki)
  • API includes classes for user, roles, and permissions.
  • Support for role and permission-based authorization, which I prefer to use
  • Quick Start Guide has example of users and roles being created
  • Access control is declaratively configured, pointing to the controller & action
  • AuthController is responsible for common auth functions (logout, login) & login page-
  • Different authentication schemes (e.g. LDAP, database based auth) supported via realms
  • Supports database-based authentication (passwords stored in database)
  • OpenID support : not directly supported in JSecurity yet, but people have gotten it working at Grails level by integrating with OpenID plugin
  • Documentation looks good, but not as much available as the Spring Security plugin

Spring Security plugin

  • Based on Spring Security (Acegi security) framework
  • Supports database-based authentication (passwords stored in database)
  • Supports OpenID and Facebook connect for authentication
  • Also supports LDAP, Kerberos, CAS, NTLM for authentication
  • Support for role-based authorization
  • User and Role Groovy classes are generated. These may be customized after generation (e.g. to add attributes).
  • Generates a simple registration page with password confirmation and CAPTCHA support
  • Pages and actions security mappings (which pages/actions should be access controlled) can be stored in database, as annotations in the Controller, or using the standard URL string mapping supported by Spring security
  • Good documentation

Both plugins look very capable and meet my core requirements. Support for OpenID is a big plus for me so I went with Spring Security. I’ve been using it for about a week now. I may jot down some notes on it in a future post.

BTW, this to me is one of the huge advantages of dynamic language frameworks on the JVM; the ability to tap into mature, very full-featured existing Java frameworks, libraries, and drivers. This is particularly true for Grails, since it so heavily leverages existing frameworks (e.g. Spring, Hibernate).

Posted in Grails | Tagged | 1 Comment

Idea 8.1 & Grails 1.1

I’ve been working with Grails again lately after not using it for a while. Last time I was using Grails was a pre-1.0 release about a year ago. Even then it was an extremely productive framework, but it was rough around the edges.

The main obstacle I saw was that exception messages and stack traces were not descriptive of the root problem. It wasn’t that the error messages were poorly written, but the error that was reported wasn’t near the root cause of the problem. The reported error was a far-removed side effect of the actual problem. If you had messed up the definition of your GORM domain class, the application may start up fine and complain that some dynamic save method was undefined for the domain class when you tried to use it. You then had to do a good deal of trial and error until you isolated what change caused the problem. Having a robust set of unit tests for your application helped, but it still took a while to diagnose the problem.

I’ve been using Grails 1.1 betas lately, and it looks like the error reporting has gotten much better. When I’ve gotten errors, it was usually pretty clear what the cause was.

Officially, IDEA only supports Grails 1.0, but it’s working fine for me with some tweaks. Here’s what I did:

  • Created the Grails 1.1 project outside of IDEA, using the grails command (not MVN plugin)
  • Redefine GROOVY and GRAILS global libraries in IDEA to point to the latest versions of each
  • Imported the existing Grails project into IDEA
  • After installing some Grails plugins (specifically, the google-chart plugin), I found that they weren’t on the classpath and the grails launcher no longer worked. Apparently Grails 1.1 moved where the plugins are stored and IDEA hasn’t been updated for this yet. To get around it, I added <home dir>/.grails/1.1/projects/<my project> to the module’s content root for my project, and added the plugin’s source directory (plugins/google-chart-0.4.8/src/groovy in my case) as a source folder within the added content root.

The grails app launcher is working for me, and I’m not getting the compile problems I was getting before.

Posted in General, Grails | Tagged , | 3 Comments

Setting up a virtual private server

I’ve had a dedicated server at a hosting provider that I’ve used to host applications and sites for years. I got a great price on the hosting package and it’s worked well, but the server’s growing long in the tooth and needs an OS upgrade. I’ve also had some hardware failures in the past which caused some downtime.

In looking for a replacement to my current server, I’ve been looking into the virtual private server options. A virtual private server will allow me to start small and scale up as needed, minimize outages due to hardware problems, and should be more economical than dedicated hosting.

Most of the VPS providers I looked at were using the open source Xen virtualization software, including Amazon EC2 and Slicehost. Amazon EC2 has some nifty features, such as pre-configured virtual server images (e.g. JBoss stack image, PHP stack image, etc), and a web services interface to manipulate your server instances, creating or removing instances as needed.

Amazon EC2 charges by the amount of time your instance is running, which for me basically means the time your instance is available to serve traffic. This is a good feature for people that need to dial up instances to handle large loads of traffic or execute some processing intensive task. I was hoping that I wouldn’t get charged for time that the instance is effectively idle, but unless I want my instance to be unavailable for some time period, I’d get charged. Amazon charges $0.10 per instance hour for the smallest instance, which if you want a server available all the time, works out to ~ $74/month, which is more than I was paying for my dedicated server.

Slicehost lets you add and remove “slices” (server instances) via their control panel, as well as resizing silces. They don’t provide a web service interface to control your instances as Amazon does. Also, I don’t see a way to upload pre-built images, such as a Apache/Tomcat/MySQL, or a Apache/PHP/MySQL pre-built image. These would be nice features, but definitely not must-haves for me. Slicehost charges on a monthly basis, with a 256MB RAM instance costing $20/month, and a 512MB instance costing $38/month. I signed up for the 256MB instance running Ubuntu linux to try it out, and was surprised it was up and running with shell access within 5 minutes of submitting the request.

Since I got set up with Slicehost, they got acquired by Rackspace, the largest hosting provider in the U.S. I see this as largely positive – giving them access to Rackspace’s data centers and economies of scale. Hopefully Slicehost’s excellent operations, web control panel, and pricing will continue to impress me.

Posted in Web | Leave a comment

Mercurial and Subversion : What’s working for me

One popular use of DVCS systems such as Mercurial and GIT is as “super clients” to Subversion, at least until more projects get on the DVCS bandwagon. You get most of the benefits of the DVCS and can still work the rest of the team using Subversion.

I’ve been using Mercurial to work on a couple projects I’m involved with that have Subversion repositories. GIT has a good bridge to Subversion built in. Unfortunately, Mercurial is a little bit behind on this front, but it looks like things are getting better quickly.

I initially tried out Tailor, a general purpose version control bridge tool, which supports Mercurial and Subversion among a bunch of others. I had difficulty getting Tailor to work with the Subversion repository I was working on, and it didn’t seem like Tailor was being used much for Mercurial-Subversion.

I’ve been using the hgsvn package lately and it’s worked out pretty well. The only caveat is that hgsvn doesn’t directly support uploading changes from Mercurial back to Subversion. hgsvn does a good job of downloading changesets from Subversion to Mercurial though, and there are a couple solutions to handle the other direction. I’ve been pushing changes back by using the excellent Mercurial MQ extension.

A new Mercurial-Subversion bridge, hgsubversion, looks like it supports both downstream and upstream changes from Mercurial to/from Subversion, and looks very promising. I’m just waiting for it to support importing starting at a particular Subversion revision before I can start using it.

Until I can use hgsubversion, here is how I’ve been using hgsvn. I installed hgsvn as a Python egg package via easy_install:

sudo easy_install hgsvn

The hgimportsvn command is used to import change sets from an existing Subversion repository into a local Mercurial repository it creates. You can optionally have it start at a particular Subversion revision if you don’t need the full history imported:

hgimportsvn [-r svn rev] <svn URL> <local directory name>

This creates a combined Subversion working directory and Mercurial repository in the local directory you give it. Once this is done you can update the Mercurial repository with the latest Subversion changesets by executing the following in the local directory:

hgpullsvn

You run hgpullsvn anytime you want to refresh the Mercurial repository with the latest from Subversion. You can then do anything you would do with a Mercurial repository, except committing changes of your own. Cloning, MQ, revision histories, diffs, etc all work. I tried committing a change to the Mercurial repository, then committing the same changes to Subversion, then doing an hgpullsvn to update both from the Subversion repository. This worked at first but the Mercurial repository got out of sync after a while, so I wouldn’t recommend it.

What I’ve been doing is managing the upstream changes via Mercurial Queues (MQ). This also helps me manage patches I want to apply on top of the latest code.

The first time using Mercurial Queues in a repository, you have to initialize the queue repository:

hg qinit -c

The -c option makes it so the queues repository is version controlled. You can then see what older versions of a patch were in case you need to roll back. I’m basically using patches as changesets, so it’s useful for me to be able to version control and have a history for them.

Once you have the queue repository, you can create a patch before making any changes to files:

hg qnew -g <patch name>

When you create a new patch, it automatically becomes an active patch at the top of the queue. You can then edit files as you normally would. ‘hg add’ and ‘hg remove’ commands mark files to be added or removed in the patch.

Once you’re ready to commit any changed/added/removed files to the patch, do a commit:

hg qcommit

Once committed, you can look in your .hg/patches/ directory and see the patch file.

Whenever I want to update from SVN, I pop everything off the patch queue so the Mercurial repository & Subversion working directory are pristine, and do the update:

hg qpop -a

hgpullsvn

I can then push the patches I’m working on back on top:

hg qpush -a

Whenever I’m ready to push changes into Subversion, I export the patch, then apply it to a separate Subversion working directory for commit:

hg export -g <patch name> > ../my.patch

in separate Subversion working directory,

patch -p1 < ../my.patch

Posted in Software Development, Tools | 2 Comments