Apache Spark Experiments

I’m in the process of learning Apache Spark for processing and transforming large data sets, as well as machine learning. As I dig into different facets of Spark, I’m compiling notes and experiments in a series of Jupyter notebooks.

I published these notebooks to a github repo, spark-experiments. Right now it has some basic and spark-sql based experiments. I’ll be adding more as I go.

Rather than setting up Jupyter, Spark, and everything else needed locally, I found an existing Docker image, pyspark-notebook, that contains everything I needed, including matplotlib to visualize the data as I get further along. If you have Docker installed, you just run the Docker container via a single command, and you’re off and running. See the spark-experiments installation instructions for details.

Initially, I was going to create my own sample data sets for the experiments. I’m mostly interested in learning the operations and process rather than executing with a large data set across a cluster of servers, so it’s ok to use a small data set. But I hit on the idea of using publicly available data sets such as those from data.cms.gov instead. Maybe we’ll turn up something interesting, and it’ll be more real-worldish.

Posted in Python, Scala | Tagged | 1 Comment

Migrating Drupal and WordPress sites using Docker

There’s several sites I host for family and friends in addition to this site. It’s a mix of WordPress, Drupal, and static sites, all running on a Linux virtual host hosted by Rackspace. The Linux host is pretty old at this point, and really should be upgraded. Additionally, I wanted to give DigitalOcean a try as I can get a virtual server there for less.

Although I kept the installations for each site pretty well organized in different directories, migrating them over the traditional way would still be time consuming and error prone, involving copying over all the directories and databases that are needed, migrating users, making sure permissions are right, and making sure to get any service scripts and configurations that need to come along. This is all a very manual process. If (when) I get something wrong, I’d have to troubleshoot it on the target server, and the whole process isn’t very repeatable nor version controlled. I wasn’t looking forward to it.

While working on our Pilot product at Encanto Squared, a new tool came on our radar, Docker, which we adopted and greatly simplified and streamlined our deployment and server provisioning process at Encanto.

Naturally, I decided to use docker to migrate my sites to another server, and to generally improve how these are being managed.

The overall configuration looks like this:


The above diagram is inspired by this dockerboard tool. The tool works but the diagram required some style tweaking so I did it in OmniGraffle.

Each of the rounded rectangles above is a separate docker container, and all of the containers are orchestrated by docker compose. The blue lines between the containers are docker compose links, which connect the two containers at a network level, and create an entry in the source’s host file pointing to the target container. Each docker container runs with its own network, layered filesystem, and process space. So for one container to be able to communicate with another it has to be specifically enabled, via links in the case of docker compose.

Following is a breakdown of each container and its configuration:

nginx – front-end reverse proxy

  • I’m using this as a reverse proxy into the other docker containers.
  • This is the only container with an exposed port, 80
  • It has a link to each of the individual site containers to be able to proxy HTTP requests to them.
  • In the future, I may have this serve up the static sites rather than proxying to another nginx process. It’ll still be needed to proxy the WordPress and Drupal sites
  • This image is based on the official nginx image, with the addition of the Nginx configuration files into the Docker image. Dockerfile:
FROM nginx
COPY conf.d /etc/nginx/conf.d
  • Each of the sites gets a separate Nginx configuration file under conf.d. They proxy to the specific site by name (mfywordpress in the example below). Here’s what one of them looks like:
server {
  listen 80;
  server_name www.mfyah.org mfyah.org;

  location / {
    proxy_pass http://mfywordpress:80;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

latinchristmas – This is a static site hosted by nginx

  • This is a static site served up by its own nginx process.
  • This is an image that is based on the official nginx image. The only thing it does in addition is add the static content to /usr/share/nginx/html
  • Dockerfile:
FROM nginx

COPY WWW /usr/share/nginx/html

mfy – WordPress-based site

  • This image is based on the official WordPress image, with some additional packages installed.
  • The official WordPress image uses Apache.
  • This container maps the directory /var/www/html to /var/lib/cont/mfywp on the host to store the WordPress site files. Having the site files on the host makes it easier to backup and ensures any changes to the site survive a restart.
  • Dockerfile:
FROM wordpress

RUN apt-get update && apt-get install -y libcurl4-openssl-dev

RUN docker-php-ext-install curl

I won’t go into the other WordPress-based containers. They’re essentially the same.

DB – MariaDB

  • This is the database for all of the WordPress sites.
  • This container maps the directory /var/lib/mysql to /var/lib/cont/db on the host to store the database files so they survive restarts & can be backed up easily.
  • It is running the official MariaDB Docker image.

Docker compose and usage

As mentioned above, all of this is managed by Docker Compose. Following is a portion of the Docker Compose configuration file.

  image: somedockerrepo/someuser/latinchristmaswebsite:latest
  restart: always

  image: somedockerrepo/someuser/mfy
  restart: always
    - db:mysql
    - /var/lib/cont/mfywp:/var/www/html

  image: mariadb
  restart: always
    MYSQL_ROOT_PASSWORD: PutSomethingHere
    - /var/lib/cont/db:/var/lib/mysql

  build: nginx
  restart: always
    - "80:80"
    - latinchristmas
    - mfywordpress

The WordPress-based site images are stored on a Docker repository. The proxy nginx image is built locally by Docker Compose.

The steps I took to get this all working on the server were roughly:

  • Install Docker if it’s not already there: sudo apt-get install lxc-docker
  • Create the directories for the individual sites (e.g. /var/lib/cont/mfywp) and copy the site files over to them
  • Create the directory for the database under /var/lib/cont/db, empty
  • Copy the Docker Compose file and the nested nginx Dockerfile and configuration files over to the server. This is in a git repository, so I packaged it up as a tar file to send: git archive --format=tar --prefix=rpstechServer/ HEAD > send.tar
  • If you’re hosting your images in a private Docker repository, create a .dockercfg file on the server containing the credentials to your private Docker repository. Docker Compose will use this on the server when pulling the images from the Docker repository. If your images are all in a public repository, this isn’t needed. You can remove the .dockercfg after the next step to avoid having the credentials on the server.
  • Run docker-compose up -d

Everything should be running at this point.

I haven’t converted over the Drupal sites yet, but the approach will be the same as the WordPress sites.

The benefits to this setup are:

  • Each site is largely self contained and easy to migrate to a different server
  • The sites are independent of each other. I can install new and upgrade packages of one site without affecting other sites.
  • I’m able to make changes and run the sites locally and test them out before pushing out any changes.

Future improvements:

  • Avoid having the MariaDB password in the Docker Compose or any other file
  • Combine some of the lines in the Dockerfiles, reducing the number of Docker layers that are created
  • Consider running the WordPress sites using a lighter weight process rather than requiring Apache. Maybe this isn’t a problem at all.
Posted in Software Development, Tools | Tagged , , | 4 Comments

Type-checked JavaScript : TypeScript and Flow

The last couple systems I’ve been working on have been almost completely JavaScript, with a bit of Python thrown in where it made sense.

Working in a dynamic language like JavaScript, small mistakes like mistyping a symbol name don’t get caught by a compiler as they do in statically typed languages. Instead they come up during runtime when that code is executed, or worse, they won’t fail right away, leading to incorrect results or failure somewhere else. To mitigate this, it becomes even more important to use unit testing extensively. If you have an extensive set of unit tests that verify almost every line of code, they’ll catch these syntax/typing bugs in addition to functional bugs.

But verifying almost every line of code with unit tests is very difficult, and I’ve rarely seen it done. Also, it’d be nice to get more immediate feedback of a syntax error, in the IDE/editor, even before running unit tests. Additionally, static typing serves as a form of documentation in the code, and enables IDEs to more accurately auto-suggest completions, which cuts down on the amount of time you spend looking up function and variable names from other modules.

That’s not to say the answer is to only use statically-typed languages. There’s many benefits to dynamic languages and reasons we’re using them in the first place.

Ideally, I’d like to have an optional typing system where typing can be specified where it makes sense, and not where it doesn’t add enough value or is impossible due to the dynamic nature of the code. Additionally, the system should be smart, using type inference to cut down on the amount of type annotations that need to be made.

Lucky for us, JavaScript has a couple excellent options that aim to do just that.

One option is TypeScript, backed by Microsoft. TypeScript supports React via a plugin, and is used by Angular 2. TypeScript has been around for several years, and has a rich set of type definitions available for popular JavaScript libraries.

TypeScript is a separate language that transpiles to JavaScript. It’s a superset of JavaScript, so anything that works in JavaScript should work in TypeScript, and they’ve worked to keep up with JavaScript and supporting ES6 features.

Another option is flow, backed by Facebook. Coming from Facebook, it has good support for React. Flow is a relatively new option, released in 2014, so doesn’t have as much of an ecosystem as Typescript and doesn’t have many type definitions for 3rd party libraries, although supporting TypeScript’s definitions is on their roadmap.

Flow makes more extensive use of type inference, so it’s able to infer types and detect errors without requiring as much explicit type annotations.

Flow has a different philosophy than TypeScript. The philosophy behind flow is to make the minimal amount of additions to JavaScript to facilitate type checking. Rather than being a separate language, flow is based on JavaScript, only extending the language with type annotations. These type annotations are stripped out by a simple transformer or via a transpiler such as Babel if you’re using that already. Also, it’s easier to gradually adopt Flow for an existing codebase as you can enable it module by module, use a ‘weak’ mode for adapting existing modules, and gradually add annotations.

My project is starting with a significant ES6 code base. We’re pretty happy with ES6 as it is, so the main thing I’m looking for is to add type checking rather than a new language. Based on these factors, we decided to try out flow.

In a future post I’ll write about our experience with trying out flow, and steps to adopt it into an existing codebase.

Posted in Software Development | Tagged | Leave a comment

Converting Maven APT docs to Markdown

In a project I worked on many moons ago we were writing documentation in the APT format, and publishing to HTML and PDF using a Maven-based toolchain.

APT served us well, but it hasn’t been supported or improved by the community in a long time. When the time came to update the documentation for a major release, we decided to switch to using Markdown, which was a format everyone was already familiar with, and allowed the team to take advantage of all the tools, such as Sublime plugins, that support Markdown.

Converting APT documents to Markdown is a two step process of APT -> XHTML -> Markdown using the Doxia converter which can be downloaded here and the excellent swiss-army document format conversion tool Pandoc:

# Converting over the existing APT docs to XHTML via Doxia converter
> java -jar ~/Downloads/doxia-converter-1.2-jar-with-dependencies.jar -in your_doc.apt \
  -from apt -out ./ -to xhtml

# Convert resulting XHTML to Markdown
> pandoc -o your_doc.md your_doc.apt.xhtml

The end result will require a bit of manual fixing up, but in my experience it was pretty minimal and beats doing it manually or writing your own converter.

Posted in General, Software Development, Tools | Tagged , , | 1 Comment

Encanto Squared

I’ve been working with Encanto Squared lately, and will be posting on the Encanto Squared Engineering site, with more of a focus on Node.js, Polymer, AngularJS, and other technologies we’re using.

Speaking of which, Encanto Squared is hiring. If you’re passionate about solving interesting problems, creating products that are key to our customers, and enjoy working with new technologies, drop us a note.

Posted in General | Leave a comment

Sculptor point release and documentation

This post is just a couple quick updates on the Sculptor Generator project.

Hot on the heels of the major 3.0 release, release 3.0.1 is out with additional improvements and examples. Kudos to Torsten, who’s been on fire cranking out code and documentation.

I made my own small contributions to the documentation, with a blog post on the shipping example project, which shows how to override Sculptor templates in your own project, and documentation on the Sculptor overrides and extension mechanism.

Posted in MDSD | Tagged , | Leave a comment

Profiling Maven Sculptor execution using YourKit

The latest version of the Sculptor Generator is based on XTend 2, which is compiled to Java/Bytescode rather than interpreted as XTend1 and XPand was. This should bring large performance improvements to the code generation cycle, and it certainly feels faster for my projects. Of course, since code generation is part of the development cycle, we’d always like the performance to be better. In order to improve the performance, we first need to know what the bottlenecks are, which is where a profiler comes in; specifically I’ll describe using YourKit to profile code generation for one of the Sculptor sample projects.

The first step is to start the YourKit profiler. YourKit will start up with the welcome view, and will show any running Java processes, ready to attach to one of them.


Now we need to execute the Sculptor generator, first attaching it to the YourKit process. Sculptor code generation is typically executed as part of a Maven build, via the Sculptor Maven plugin. Since Maven isn’t a long-running process, and we want to make sure to profile all of the code generation cycle, the best way to attach Sculptor to Maven is to do it at Maven startup via JVM command line arguments. Specifically -agentpath to attach the process to YourKit and enable profiling, and YourKit startup options that can be used to enable different types of profiling, taking snapshots, etc.

To pass these arguments to Maven, we can use the MAVEN_OPTS environment variable. I already had some JVM arguments to set the maximum memory. So on my Mac, I ended up with:

export MAVEN_OPTS='-agentpath:/Applications/YourKit_Java_Profiler_2013_build_13046.app/bin/mac/libyjpagent.jnilib=tracing,onexit=snapshot -Xmx1424m -XX:MaxPermSize=1024m'

The above will enable the tracing profiling method (vs sampling), and instruct YourKit to record a snapshot that may later be inspected on process exit.

You can control how YourKit performs tracing via Settings -> CPU Tracing… The only tracing setting I changed was to disable adaptive tracing, which omits some small frequently called methods from profiling. This lessens the profiling overhead, but I’m not really concerned about that and want to make sure I’m getting complete results.


Now that the options are set up, run Maven in the Sculptor project to be profiled. In my case, the library-example project:

mvn sculptor:generate -Dsculptor.generator.force=true

Once it’s done executing, we can open the previously recorded snapshot via File->Open Snapshot.., and look at the different reports and views. This is what the call tree view looks like:


These results are fine, but the trace results are cluttered with many methods we’re not interested in, since the entire Maven execution has been traced. The best option I found to only trace those methods we’re interested in was to initially disable tracing, then use a YourKit Trigger to enable tracing on entry and exit of the main Sculptor generation method, org.sculptor.generator.SculptorGeneratorRunner.doRun.

In YourKit, you can add a trigger via the “Trigger action on event” button.


The problem is this button seems to only be enabled if YourKit is actively profiling an application, and since the Maven execution isn’t a long-running process, you can’t configure it in time. The solution I used was to start Maven suspended in debug mode, configure the trigger, then kill Maven. Again, this can be done by adding some JVM arguments to MAVEN_OPTS, and running Maven again:

export MAVEN_OPTS='-agentpath:/Applications/YourKit_Java_Profiler_2013_build_13046.app/bin/mac/libyjpagent.jnilib=tracing,onexit=snapshot -Xmx1424m -XX:MaxPermSize=1024m -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=y'

Once YourKit is attached to the running Maven process, we can add the trigger:


To be able to use this trigger each time Sculptor is executed via Maven, we have to export the trigger configuration into a file, then when running Maven, specify the trigger file via another YourKit argument. We can export the trigger via Popup menu->Export Triggers…

Following is the exported trigger configuration. The above steps are just a means to end up with this configuration, so you can skip them and simply copy the following into a triggers.txt file.

MethodListener methodPattern=org.sculptor.generator.SculptorGeneratorRunner\s:\sdoRun\s(\sString\s) instanceOf= fillThis=true fillParams=true fillReturnValue=true maxTriggerCount=-1
  onenter: StartCPUTracing
  onreturn: StopCPUProfiling
  onexception: StopCPUProfiling

To specify the trigger file that should be used, use the ‘triggers’ command line argument. Since tracing will now be enabled via the trigger, I also removed the ‘tracing’ argument so tracing wouldn’t be enabled on startup:

export MAVEN_OPTS='-agentpath:/Applications/YourKit_Java_Profiler_2013_build_13046.app/bin/mac/libyjpagent.jnilib=triggers=triggers.txt,onexit=snapshot -Xmx1424m -XX:MaxPermSize=1024m'
Posted in Java, MDSD, Tools | Tagged , | 1 Comment

Working with Geospatial support in MongoDB: the basics

A project I’m working on requires storage of and queries on Geospatial data. I’m using MongoDB, which has good support for Geospatial data, at least good enough for my needs. This post walks through the basics of inserting and querying Geospatial data in MongoDB.

First off, I’m working with MongoDB 2.4.5, the latest. I initially tried this out using 2.2.3 and it wasn’t recognizing the 2dsphere index I set up, so I had to upgrade.

MongoDB supports storage of Geospatial types, represented as GeoJSON objects, specifically the Point, LineString, and Polygon types. I’m just going to work with Point objects here.

Once Geospatial data is stored in MongoDB, you can query for:

  • Inclusion: Whether locations are included in a polygon
  • Intersection: Whether locations intersect with a specified geometry
  • Proximity: Querying for points nearest other points

You have two options for indexing Geospatial data:

  • 2d : Calculations are done based on flat geometry
  • 2dsphere : Calculations are done based on spherical geometry

As you can imagine, 2dsphere is more accurate, especially for points that are further apart.

In my example, I’m using a 2dsphere index, and doing proximity queries.

First, create the collection that’ll hold a point. I’m planning to work this into the Sculptor code generator so I’m using the ‘port’ collection which is part of the ‘shipping’ example MongoDB-based project.

> db.createCollection("port") { "ok" : 1 }

Next, insert records into the collection including a GeoJSON type, point. According to MongoDB docs, in order to index the location data, it must be stored as GeoJSON types.

> db.port.insert( { name: "Boston", loc : { type : "Point", coordinates : [ 71.0603, 42.3583 ] } })
> db.port.insert( { name: "Chicago", loc : { type : "Point", coordinates : [ 87.6500, 41.8500 ] } })

> db.port.find()

{ "_id" : ObjectId("51e47b4588ecd4e8dedf7185"), "name" : "Boston", "loc" : { "type" : "Point", "coordinates" : [  71.0603,  42.3583 ] } }
{ "_id" : ObjectId("51e47ee688ecd4e8dedf7187"), "name" : "Chicago", "loc" : { "type" : "Point", "coordinates" : [  87.65,  41.85 ] } }

The coordinates above, as with all coordinates in MongoDB, are in longitude, latitude order.

Next, we create a 2dsphere index, which supports geolocation queries over spherical spaces.

> db.port.ensureIndex( { loc: "2dsphere" }) >

Once this is set up, we can issue location-based queries, in this case using the ‘geoNear’ command:

> db.runCommand( { geoNear: 'port', near: {type: "Point", coordinates: [87.9806, 42.0883]}, spherical: true, maxDistance: 40000})

    "ns" : "Shipping-test.port",
    "results" : [
            "dis" : 38110.32969523317,
            "obj" : {
                "_id" : ObjectId("51e47ee688ecd4e8dedf7187"),
                "name" : "Chicago",
                "loc" : {
                    "type" : "Point",
                    "coordinates" : [
    "stats" : {
        "time" : 1,
        "nscanned" : 1,
        "avgDistance" : 38110.32969523317,
        "maxDistance" : 38110.32969523317
    "ok" : 1

For some reason, a similar query using ‘find’ and the ‘near’ operator, which should work, doesn’t:

> db.port.find( { "port" : { $near : { $geometry : { type : "Point", coordinates: [87.9806, 42.0883] } }, $maxDistance: 40000 } } )

error: {
"$err" : "can't find any special indices: 2d (needs index), 2dsphere (needs index),  for: { port: { $near: { $geometry: { type: \"Point\", coordinates: [ 87.9806, 42.0883 ] } }, $maxDistance: 40000.0 } }",
"code" : 13038
Posted in General, MDSD | Tagged , | Comments Off on Working with Geospatial support in MongoDB: the basics

Easy Grails Hosting: Cloud Foundry


These are some (old) notes on my experience with looking for an easy Grails hosting solution. This is a continuation of this post where I explored Heroku for Grails hosting.


Wikipedia defines Cloud Foundry as

“Cloud Foundry is an open source cloud computing platform as a service (PaaS) software developed by VMware released under the terms of the Apache License 2.0. It is primarily written in Ruby. The source and development community for this software is available at cloudfoundry.org”

Cloud Foundry is also a hosted service provided by VMWare, the principal company behind the Cloud Foundry platform. In addition to VMWare, several companies provide hosting services.

How does Cloud Foundry stack up against my original requirements?

  • Free or cheap to get started

    • appfog offers unlimited apps with 2 GB RAM and 100 MB storage, using their sub-domain
    • Cloudfoundry.com is currently free. Actually, they’re in a sort of beta mode, and there is no information on a non-free account. Currently the limits are 2 GB of memory and 2 GB of storage.
  • No vendor lock-in. If I want to move to another provider, no code changes necessary

    The Cloud Foundry platform is open source (Apache License 2.0), and there are multiple providers as I previously noted, so check.

  • Ability to scale up the number of instances and amount of memory easily

    Instances can be scaled up on-demand via their VMC command line utility.

  • Minimal effort to set up a Grails application

    Cloud Foundry has excellent Grails support. Not surprising considering VMWare is behing both Grails and Cloud Foundry. The Cloud Foundry Grails plugin makes it easy to deploy, update, and overall manage your Cloud Foundry-based Grails application. This post is a good getting started guide.

  • Support for MySQL or PostgreSQL, and MongoDB

    All of these are supported by Cloud Foundry as services. The getting started guide lists the available services (see left menu).

  • HTTPS support

    Cloud Foundry supports HTTPS out of the box, but it sounds like that terminates at their load balancer, so communication between the load balancer and your instance is unencrypted. Not a big deal for me, at least when starting out. Other providers like appfog may more fully support SSL.

Another interesting point on Cloud Foundry is that you can run your own instance of Cloud Foundry via Micro Cloud Foundry and VMWare. Looks like it’s also possible to run it on VirtualBox

I deployed my Grails 1.4 based app to cloudfoundry.com, using the MySQL and MongoDB services. It pretty much worked as advertised and was a breeze to get started with. I had to increase my instance’s memory limit from the default 512MB, but that was easy to do via VMC.

I also ran into this problem with Grails and Spring Security on Cloud Foundry. The solution of adding the following to BuildConfig.groovy worked for me.

compile ":webxml:1.4.1"

I was running with Grails 1.4, so the latest Grails 2.* may not have this problem.

Posted in Grails | Tagged | Leave a comment

XText2: Starting a project for a GWT GUI DSL

This is the first in a series of posts where I’ll explore XText2 by porting an existing XText1-based DSL to XText2. The DSL to be ported is a GUI-description language based on the Sculptor DSL that generates Google Web Toolkit UIs using the Activities and Places framework.

The first step is to set up a new XText2 project and prepare it to start adding elements from the existing DSL. We can create the XText 2 project via the steps in the First Five Minutes Tutorial.


Once the project is set up, let’s replace the starter DSL in GuixDsl.xtext with a couple elements of the real DSL; a minimal View and Gui Module definition:

grammar org.guixdsl.Guixdsl with org.eclipse.xtext.common.Terminals

generate guixdsl "http://www.guixdsl.org/Guixdsl"

DslAbstractGuiElement :
    DslGuiModule | DslView;
    'Module' name=ID '{'
        ("hint" "=" hint=STRING)?
    'View' name=ID '{'

The language artifacts, including the Eclipse editor, can be generated by selecting the Guixdsl.xtext file and selecting  context->Run As->Generate XText Artifacts.


The first time you generate language artifacts, the MWE2 workflow prompts you as to whether you want to download the ANTLR 3 parser generator (recommended), which it has to do due to a license conflict of some sort. Say yes and proceed.


Once the language artifacts have been generated, we can take the DSL out for a spin by selecting the org.guixdsl project, and selecting context->Run As->Eclipse Application.  This will launch a new Eclipse instance with our DSL plugins installed.  Once in the new Eclipse instance, we can create a new test Java project, and create a test DSL file in the project source folder.


If you get a dialog that asks whether you want to add the Xtext nature to the test project, be sure to say Yes. I thought this was only needed to edit the language grammar or code generators. I found out the hard way that your custom code generator won’t get executed unless you have the XText nature enabled, but oddly enough, the JVM model inferrer will. But more about the code generator and inferrer below.


We’ll come back to this test project throughout the development of the new DSL to test out different features. For now, we just want to make sure our minimal DSL and generated language artifacts worked ok.

One of the new features of XText2 is the ability to map DSL concepts directly to Java types, which XText then directly generates code for. This mapping is defined by implementing an inferrer class that extends AbstractModelInferrer.

Of course, generating code via templates is still supported, and done by defining a class that implements IGenerator. When you create a new XText project, by default it’s set up to use the generator strategy rather than the inferrer. Our goal is to use both. The existing XText1-based project uses XPand templates, which can be converted to XTend2 templates. At the same time, we’d like to start using the Java types inferrence approach for some classes.

It wasn’t obvious how to switch to using the inferrer strategy. Close inspection of the “Five simple steps to your JVM language” tutorial showed what needed to be changed to start using the inferrer. In your xtext grammar, replace:

grammar org.Guixdsl with org.eclipse.xtext.common.Terminals


grammar org.Guixdsl with org.eclipse.xtext.xbase.Xbase

Run the GenerateGuixdsl.mwe2 MWE2 workflow to generate everything, including the code generation infrastructure, based on the DSL.


This yields a JVM model inferrer class ready to go:

package org.guixdsl.jvmmodel

import com.google.inject.Inject
import org.eclipse.xtext.xbase.jvmmodel.AbstractModelInferrer
import org.eclipse.xtext.xbase.jvmmodel.IJvmDeclaredTypeAcceptor
import org.eclipse.xtext.xbase.jvmmodel.JvmTypesBuilder
import org.guixdsl.guixdsl.DslModel

 * <p>Infers a JVM model from the source model.</p> 
 * <p>The JVM model should contain all elements that would appear in the Java code 
 * which is generated from the source model. Other models link against the JVM model rather than the source model.</p>     
class GuixdslJvmModelInferrer extends AbstractModelInferrer {

     * convenience API to build and initialize JVM types and their members.
	@Inject extension JvmTypesBuilder

	 * The dispatch method {@code infer} is called for each instance of the
	 * given element's type that is contained in a resource.
	 * @param element
	 *            the model to create one or more
	 *            {@link org.eclipse.xtext.common.types.JvmDeclaredType declared
	 *            types} from.
	 * @param acceptor
	 *            each created

This works great, but switching to use the XBase base grammar caused the generator strategy to get disabled. For this DSL, we need to be able to generate some classes via JVM inference, and some via template-based code generation. Thanks to the this post on RCP Vision for info on this & how to re-enable the generator.

What happened is XText switched the generator from our generator, GuixdslGenerator, to org.eclipse.xtext.xbase.compiler.JvmModelGenerator, which generates code based on inferred types.

You can see the binding in the AbstractGuixdslRuntimeModule generated module in the org.guixdsl project where both the generator and the inferrer are bound.

	// contributed by org.eclipse.xtext.generator.xbase.XbaseGeneratorFragment
	public Class<? extends org.eclipse.xtext.generator.IGenerator> bindIGenerator() {
		return org.eclipse.xtext.xbase.compiler.JvmModelGenerator.class;

	// contributed by org.eclipse.xtext.generator.xbase.XbaseGeneratorFragment
	public Class<? extends org.eclipse.xtext.xbase.jvmmodel.IJvmModelInferrer> bindIJvmModelInferrer() {
		return org.guixdsl.jvmmodel.GuixdslJvmModelInferrer.class;

To re-enable the generator class, update the binding in GuixdslRuntimeModule, which extends AbstractGuixdslRuntimeModule and is generated only once (can be modified whereas AbstractGuixdslRuntimeModule cannot).

     * Avoid using the default org.eclipse.xtext.xbase.compiler.JvmModelGenerator
     * when using xbase.
     * @see org.guixdsl.AbstractGuixdslRuntimeModule#bindIGenerator()
    public Class<? extends IGenerator> bindIGenerator() {
        return GuixdslGenerator.class;

Now the generator is back and getting invoked, but generation isn’t happening for any inferred types. This is because although the inferrer is getting called, the JvmModelGenerator is the class that actually generates the code, and we just switched it off above. We want to have both our custom generator, and generation based on inferred types. To enable both, modify our generator to extend JvmModelGenerator, and delegate to it.

class GuixdslGenerator extends JvmModelGenerator {
	override void doGenerate(Resource resource, IFileSystemAccess fsa) {
		super.doGenerate(resource, fsa)


Now both will work:

  • Generation of any inferred types we define in GuixdslJvmModelInferrer
  • Custom, template-based generators defined in GuixdslGenerator

In a future post, we’ll add support for package definitions to the language, and start generating GWT code.

Posted in Java, MDSD | Tagged , , , , , | Leave a comment