Automatically installing and cleaning up software on Kubernetes hosts

I had a need to automatically install software on each node in a Kubernetes cluster. In my case, security scanning software. Kubernetes can start new nodes to scale up automatically, destroy nodes when no longer needed, and create/destroy nodes as part of automatic Kubernetes upgrades. For this reason, the mechanism to install this software has be integrated into Kubernetes itself, so when Kubernetes creates nodes, it automatically installs whatever additional software is needed.

I came across a clever solution using Kubernetes DaemonSets and the Linux nsenter command, described here. The solution consists of:
  • A Kubernetes DaemonSet which ensures that each server in the cluster (or some subset of them you specify) runs a single copy of an installer pod.
  • The installer pod runs an installer docker image which copies the installer and other needed files onto the node, and runs the installer script you provide via nsenter so the script runs within the host namespace instead of the docker container
The DaemonSet runs a given pod, in our case the installer pod which runs the installer script, automatically on each Kubernetes server, including any new servers created as part of horizontal scaling or upgrades.

Shekhar Patnaik has implemented and packaged this pattern up into a Docker image and sample DaemonSet. The project is here (AKSNodeInstaller).

There’s a couple additional things I needed which the above project doesn’t do
  • The ability to clean up installed software before a Kubernetes node is destroyed; In my case uninstalling packages and de-registering agents
  • Support for copying files onto the node for installation (e.g. debian package files)
To support this, I extended AKSNodeInstaller with the above features, and a sample of how to test in VirtualBox/Minikube. The forked github repo is at https://github.com/rcodesmith/KubeNodeInstaller and the installer docker image is at rcodesmith/kubenodeinstaller.

Please read the original blog post from Shekhar Patnaik to understand how the DaemonSet and installer Docker image work together.
To support registering a cleanup script to be called before a node is destroyed, I use a Container preStop hook in the DaemonSet. The preStop hook lets you specify a command to be run before a container is stopped. Since the DaemonSet pod and its containers are started when a node is created, and stopped before a node is destroyed, the preStop hook lets us run a cleanup shell script just before the Kubernetes node is destroyed.

The fragment of the sample DaemonSet manifest showing the preStop hook and the install and cleanup scripts volume mount looks like this:

apiVersion: v1
kind: Namespace
metadata:
  name: node-installer
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: installer
  namespace: node-installer
spec:
  selector:
    matchLabels:
      job: installer
  template:
    metadata:
      labels:
        job: installer
    spec:
      hostPID: true
      restartPolicy: Always
      containers:
      - image: rcodesmith/kubenodeinstaller:1.1
        name: installer
        securityContext:
          privileged: true
        volumeMounts:
        - name: install-cleanup-scripts
          mountPath: /tmp
        - name: host-mount
          mountPath: /host
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh","-c","./runCleanup.sh"]
      volumes:
      - name: install-cleanup-scripts
        configMap:
          name: sample-installer-config
      - name: host-mount
        hostPath:
          path: /tmp/install
The runCleanup.sh script will run a cleanup.sh script you provide on the host via nsenter. You supply the cleanup.sh script via a ConfigMap that is mounted into the pod as a volume, same as the install.sh script. Following is an example ConfigMap:
apiVersion: v1
kind: ConfigMap
metadata:
  name: sample-installer-config
  namespace: node-installer
data:
  install.sh: |
    #!/bin/bash
    # Test that the install file we provided in Docker image is there
    if [ ! -f /vagrant/files/sample_install_file.txt ]; then
        echo "sample_install_file not found on host!"
        exit 0
    fi
    # Update and install packages
    sudo apt-get update
    sudo apt-get install cowsay -y
    touch /vagrant/samplefile.txt
  cleanup.sh: |
    #!/bin/bash
    sudo apt-get remove cowsay -y
    rm /vagrant/samplefile.txt
I also had a need to install a package from a file that wasn’t in a repository. To support this, I add whatever files are needed to a custom installer Docker image, then copy those files onto the node. The install script you supply can then make use of those files. To use this, supply your own Docker image which copies whatever additional install files you need in a files/ directory. For example:

FROM rcodesmith/kubenodeinstaller
COPY files /files
Then use the docker image in your DaemonSet manifest instead of rcodesmith/kubenodeinstaller.

Finally, you can make use of whatever files you copied in your install script. The files will be copied onto the host in whatever directory you mounted into /host in your DaemonSet.

In summary, to use this solution:
  1. Create a ConfigMap with the installer script, named install.sh, with whatever install commands you want. They’ll be executed on the node whenever a new server is added.
  2. If you need some additional files for your install script, such as debian package files, create a custom Docker Image and include those files in the image via the Docker COPY command. Then use the Docker image in your DaemonSet manifest.
  3. If you have some cleanup steps to execute, provide a cleanup.sh script in the same ConfigMap. The script will be executed on the node before a server is destroyed.
Testing in VirtualBox and Minikube Initially, I was testing out the solution and my install script by creating / destroying Kubernetes node pools in GKE. This wasn’t ideal, so I wanted a faster, local way to test. Following is a way to test this out locally using Vagrant, VirtualBox and Minikube. VirtualBox is a free machine virtualization product from Oracle that runs on Mac, Linux, and Windows. We’ll use VirtualBox to run an Ubuntu VM locally on top of which Minikube will run. Essentially, the VM will be our Kubernetes host.

Minikube is a Kubernetes implementation suitable for running locally on Mac, Linux, or Windows.

Vagrant is a tool that can automate the creation and setup of machines, and supports multiple providers including VirtualBox. We’ll use it to automate the creation of and setup of the VirtualBox Ubuntu VM and Minikube.

Following are install instructions for Mac using Homebrew, but you can also use Windows and Linux:

Install VirtualBox, extensions, and Vagrant:
brew install Caskroom/cask/virtualbox
brew install Caskroom/cask/virtualbox-extension-pack
brew install vagrant
vagrant plugin install vagrant-vbguest
Install whatever Vagrant box you need, corresponding to what you’ll use for your Kubernetes nodes:

You can find boxes at: https://app.vagrantup.com/boxes/search

I’m using this Ubuntu box.

To get started with a Vagrant box
vagrant init ubuntu/focal64
The above command will generate a Vagrantfile in the current directory which describes the VM to be created, and steps to provision it. The Vagrantfile I used is here. You might need to add more memory for the VM in the Vagrantfile:

  config.vm.provider "virtualbox" do |vb|
    # Display the VirtualBox GUI when booting the machine
#    vb.gui = true
  
    # Customize the amount of memory on the VM:
    vb.memory = "2024"
  end

In the Vagrantfile, use the Vagrant shell provisioner to install Minikube, Docker, and kubectl. We’re using the Minikube ‘none’ driver which will cause it to run Kubernetes in the current server (the Vagrant VM). And finally, start minikube.
  # Enable provisioning with a shell script. Additional provisioners such as
  # Ansible, Chef, Docker, Puppet and Salt are also available. Please see the
  # documentation for more information about their specific syntax and use.
  config.vm.provision "shell", inline: <<-SHELL
    sudo apt update
    sudo curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64   && sudo chmod +x minikube
    sudo mv minikube /usr/local/bin/minikube
    sudo apt install conntrack
    sudo minikube config set vm-driver none
    sudo sysctl fs.protected_regular=0
    sudo apt install -y docker.io
    sudo apt-get install -y apt-transport-https
    curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
    echo "deb https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee -a /etc/apt/sources.list.d/kubernetes.list
    sudo apt-get update
    sudo apt-get install -y kubectl
    sudo minikube start --driver=none
  SHELL
To verify Minikube is running in the VM:
> sudo minikube statusminikube
type: Control Plane
host: Running
kubelet: Running
apiserver: Running
kubeconfig: Configured
To start Minikube if it isn’t running:
sudo minikube start --driver=none
Now that Minikube is running, you can interact with the Kubernetes cluster using Kubectl.

> sudo kubectl get nodes
 
NAME STATUS ROLES AGE VERSION
ubuntu-focal Ready control-plane,master 10d v1.21.2
Now, apply your ConfigMap and DaemonSet. Following is an example from https://github.com/rcodesmith/KubeNodeInstaller

# Change to project directory mounted in VM
cd /vagrant# Apply ConfigMap and DaemonSet
sudo kubectl apply -f k8s/sampleconfigmap.yaml
sudo kubectl apply -f k8s/daemonset.yaml

# The DaemonSet's pods should be running, one per server (1 here). Check:
sudo kubectl get pods -n node-installer# Look at pod logs, look for errors:
sudo kubectl logs daemonset/installer -c installer -n node-installer
My DaemonSet and Docker image had an install file which should have been copied to the VM.
Additionally, the install script wrote to /vagrant/samplefile.txt. Check for these:
> ls -l /vagrant/files/sample_install_file.txt
> ls -l /vagrant/samplefile.txt
The cleanup script should delete /vagrant/samplefile.txt. Let’s test this by deleting the DaemonSet, then verifying the file is deleted.

> sudo kubectl delete -f k8s/daemonset.yaml
> ls -l /vagrant/samplefile.txt
ls: cannot access '/vagrant/samplefile.txt': No such file or directory
Now that we tested everything, to destroy the VM and everything in it, run following back on your workstation:
vagrant destroy
Posted in Software Development, Tools | Tagged | Leave a comment

Apache Spark Experiments

I’m in the process of learning Apache Spark for processing and transforming large data sets, as well as machine learning. As I dig into different facets of Spark, I’m compiling notes and experiments in a series of Jupyter notebooks.

I published these notebooks to a github repo, spark-experiments. Right now it has some basic and spark-sql based experiments. I’ll be adding more as I go.

Rather than setting up Jupyter, Spark, and everything else needed locally, I found an existing Docker image, pyspark-notebook, that contains everything I needed, including matplotlib to visualize the data as I get further along. If you have Docker installed, you just run the Docker container via a single command, and you’re off and running. See the spark-experiments installation instructions for details.

Initially, I was going to create my own sample data sets for the experiments. I’m mostly interested in learning the operations and process rather than executing with a large data set across a cluster of servers, so it’s ok to use a small data set. But I hit on the idea of using publicly available data sets such as those from data.cms.gov instead. Maybe we’ll turn up something interesting, and it’ll be more real-worldish.

Posted in Python, Scala | Tagged | Leave a comment

Migrating Drupal and WordPress sites using Docker

There’s several sites I host for family and friends in addition to this site. It’s a mix of WordPress, Drupal, and static sites, all running on a Linux virtual host hosted by Rackspace. The Linux host is pretty old at this point, and really should be upgraded. Additionally, I wanted to give DigitalOcean a try as I can get a virtual server there for less.

Although I kept the installations for each site pretty well organized in different directories, migrating them over the traditional way would still be time consuming and error prone, involving copying over all the directories and databases that are needed, migrating users, making sure permissions are right, and making sure to get any service scripts and configurations that need to come along. This is all a very manual process. If (when) I get something wrong, I’d have to troubleshoot it on the target server, and the whole process isn’t very repeatable nor version controlled. I wasn’t looking forward to it.

While working on our Pilot product at Encanto Squared, a new tool came on our radar, Docker, which we adopted and greatly simplified and streamlined our deployment and server provisioning process at Encanto.

Naturally, I decided to use docker to migrate my sites to another server, and to generally improve how these are being managed.

The overall configuration looks like this:

rpsSitesDocker

The above diagram is inspired by this dockerboard tool. The tool works but the diagram required some style tweaking so I did it in OmniGraffle.

Each of the rounded rectangles above is a separate docker container, and all of the containers are orchestrated by docker compose. The blue lines between the containers are docker compose links, which connect the two containers at a network level, and create an entry in the source’s host file pointing to the target container. Each docker container runs with its own network, layered filesystem, and process space. So for one container to be able to communicate with another it has to be specifically enabled, via links in the case of docker compose.

Following is a breakdown of each container and its configuration:

nginx – front-end reverse proxy

  • I’m using this as a reverse proxy into the other docker containers.
  • This is the only container with an exposed port, 80
  • It has a link to each of the individual site containers to be able to proxy HTTP requests to them.
  • In the future, I may have this serve up the static sites rather than proxying to another nginx process. It’ll still be needed to proxy the WordPress and Drupal sites
  • This image is based on the official nginx image, with the addition of the Nginx configuration files into the Docker image. Dockerfile:
FROM nginx
COPY conf.d /etc/nginx/conf.d
  • Each of the sites gets a separate Nginx configuration file under conf.d. They proxy to the specific site by name (mfywordpress in the example below). Here’s what one of them looks like:
server {
  listen 80;
  server_name www.mfyah.org mfyah.org;

  location / {
    proxy_pass http://mfywordpress:80;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }
}

latinchristmas – This is a static site hosted by nginx

  • This is a static site served up by its own nginx process.
  • This is an image that is based on the official nginx image. The only thing it does in addition is add the static content to /usr/share/nginx/html
  • Dockerfile:
FROM nginx

COPY WWW /usr/share/nginx/html

mfy – WordPress-based site

  • This image is based on the official WordPress image, with some additional packages installed.
  • The official WordPress image uses Apache.
  • This container maps the directory /var/www/html to /var/lib/cont/mfywp on the host to store the WordPress site files. Having the site files on the host makes it easier to backup and ensures any changes to the site survive a restart.
  • Dockerfile:
FROM wordpress

RUN apt-get update &amp;&amp; apt-get install -y libcurl4-openssl-dev

RUN docker-php-ext-install curl

I won’t go into the other WordPress-based containers. They’re essentially the same.

DB – MariaDB

  • This is the database for all of the WordPress sites.
  • This container maps the directory /var/lib/mysql to /var/lib/cont/db on the host to store the database files so they survive restarts & can be backed up easily.
  • It is running the official MariaDB Docker image.

Docker compose and usage

As mentioned above, all of this is managed by Docker Compose. Following is a portion of the Docker Compose configuration file.

latinchristmas:
  image: somedockerrepo/someuser/latinchristmaswebsite:latest
  restart: always

mfywordpress:
  image: somedockerrepo/someuser/mfy
  restart: always
  links:
    - db:mysql
  environment:
    WORDPRESS_DB_NAME: mfy
  volumes:
    - /var/lib/cont/mfywp:/var/www/html

db:
  image: mariadb
  restart: always
  environment:
    MYSQL_ROOT_PASSWORD: PutSomethingHere
  volumes:
    - /var/lib/cont/db:/var/lib/mysql

nginx:
  build: nginx
  restart: always
  ports:
    - &quot;80:80&quot;
  links:
    - latinchristmas
    - mfywordpress

The WordPress-based site images are stored on a Docker repository. The proxy nginx image is built locally by Docker Compose.

The steps I took to get this all working on the server were roughly:

  • Install Docker if it’s not already there: sudo apt-get install lxc-docker
  • Create the directories for the individual sites (e.g. /var/lib/cont/mfywp) and copy the site files over to them
  • Create the directory for the database under /var/lib/cont/db, empty
  • Copy the Docker Compose file and the nested nginx Dockerfile and configuration files over to the server. This is in a git repository, so I packaged it up as a tar file to send: git archive --format=tar --prefix=rpstechServer/ HEAD &gt; send.tar
  • If you’re hosting your images in a private Docker repository, create a .dockercfg file on the server containing the credentials to your private Docker repository. Docker Compose will use this on the server when pulling the images from the Docker repository. If your images are all in a public repository, this isn’t needed. You can remove the .dockercfg after the next step to avoid having the credentials on the server.
  • Run docker-compose up -d

Everything should be running at this point.

I haven’t converted over the Drupal sites yet, but the approach will be the same as the WordPress sites.

The benefits to this setup are:

  • Each site is largely self contained and easy to migrate to a different server
  • The sites are independent of each other. I can install new and upgrade packages of one site without affecting other sites.
  • I’m able to make changes and run the sites locally and test them out before pushing out any changes.

Future improvements:

  • Avoid having the MariaDB password in the Docker Compose or any other file
  • Combine some of the lines in the Dockerfiles, reducing the number of Docker layers that are created
  • Consider running the WordPress sites using a lighter weight process rather than requiring Apache. Maybe this isn’t a problem at all.
Posted in Software Development, Tools | Tagged , , | 2 Comments

Type-checked JavaScript : TypeScript and Flow

The last couple systems I’ve been working on have been almost completely JavaScript, with a bit of Python thrown in where it made sense.

Working in a dynamic language like JavaScript, small mistakes like mistyping a symbol name don’t get caught by a compiler as they do in statically typed languages. Instead they come up during runtime when that code is executed, or worse, they won’t fail right away, leading to incorrect results or failure somewhere else. To mitigate this, it becomes even more important to use unit testing extensively. If you have an extensive set of unit tests that verify almost every line of code, they’ll catch these syntax/typing bugs in addition to functional bugs.

But verifying almost every line of code with unit tests is very difficult, and I’ve rarely seen it done. Also, it’d be nice to get more immediate feedback of a syntax error, in the IDE/editor, even before running unit tests. Additionally, static typing serves as a form of documentation in the code, and enables IDEs to more accurately auto-suggest completions, which cuts down on the amount of time you spend looking up function and variable names from other modules.

That’s not to say the answer is to only use statically-typed languages. There’s many benefits to dynamic languages and reasons we’re using them in the first place.

Ideally, I’d like to have an optional typing system where typing can be specified where it makes sense, and not where it doesn’t add enough value or is impossible due to the dynamic nature of the code. Additionally, the system should be smart, using type inference to cut down on the amount of type annotations that need to be made.

Lucky for us, JavaScript has a couple excellent options that aim to do just that.

One option is TypeScript, backed by Microsoft. TypeScript supports React via a plugin, and is used by Angular 2. TypeScript has been around for several years, and has a rich set of type definitions available for popular JavaScript libraries.

TypeScript is a separate language that transpiles to JavaScript. It’s a superset of JavaScript, so anything that works in JavaScript should work in TypeScript, and they’ve worked to keep up with JavaScript and supporting ES6 features.

Another option is flow, backed by Facebook. Coming from Facebook, it has good support for React. Flow is a relatively new option, released in 2014, so doesn’t have as much of an ecosystem as Typescript and doesn’t have many type definitions for 3rd party libraries, although supporting TypeScript’s definitions is on their roadmap.

Flow makes more extensive use of type inference, so it’s able to infer types and detect errors without requiring as much explicit type annotations.

Flow has a different philosophy than TypeScript. The philosophy behind flow is to make the minimal amount of additions to JavaScript to facilitate type checking. Rather than being a separate language, flow is based on JavaScript, only extending the language with type annotations. These type annotations are stripped out by a simple transformer or via a transpiler such as Babel if you’re using that already. Also, it’s easier to gradually adopt Flow for an existing codebase as you can enable it module by module, use a ‘weak’ mode for adapting existing modules, and gradually add annotations.

My project is starting with a significant ES6 code base. We’re pretty happy with ES6 as it is, so the main thing I’m looking for is to add type checking rather than a new language. Based on these factors, we decided to try out flow.

In a future post I’ll write about our experience with trying out flow, and steps to adopt it into an existing codebase.

Posted in Software Development | Tagged | Leave a comment

Converting Maven APT docs to Markdown

In a project I worked on many moons ago we were writing documentation in the APT format, and publishing to HTML and PDF using a Maven-based toolchain.

APT served us well, but it hasn’t been supported or improved by the community in a long time. When the time came to update the documentation for a major release, we decided to switch to using Markdown, which was a format everyone was already familiar with, and allowed the team to take advantage of all the tools, such as Sublime plugins, that support Markdown.

Converting APT documents to Markdown is a two step process of APT -> XHTML -> Markdown using the Doxia converter which can be downloaded here and the excellent swiss-army document format conversion tool Pandoc:

# Converting over the existing APT docs to XHTML via Doxia converter
> java -jar ~/Downloads/doxia-converter-1.2-jar-with-dependencies.jar -in your_doc.apt \
  -from apt -out ./ -to xhtml

# Convert resulting XHTML to Markdown
> pandoc -o your_doc.md your_doc.apt.xhtml

The end result will require a bit of manual fixing up, but in my experience it was pretty minimal and beats doing it manually or writing your own converter.

Posted in General, Software Development, Tools | Tagged , , | Leave a comment

Encanto Squared

I’ve been working with Encanto Squared lately, and will be posting on the Encanto Squared Engineering site, with more of a focus on Node.js, Polymer, AngularJS, and other technologies we’re using.

Speaking of which, Encanto Squared is hiring. If you’re passionate about solving interesting problems, creating products that are key to our customers, and enjoy working with new technologies, drop us a note.

Posted in General | Leave a comment

Sculptor point release and documentation

This post is just a couple quick updates on the Sculptor Generator project.

Hot on the heels of the major 3.0 release, release 3.0.1 is out with additional improvements and examples. Kudos to Torsten, who’s been on fire cranking out code and documentation.

I made my own small contributions to the documentation, with a blog post on the shipping example project, which shows how to override Sculptor templates in your own project, and documentation on the Sculptor overrides and extension mechanism.

Posted in MDSD | Tagged , | Leave a comment

Profiling Maven Sculptor execution using YourKit

The latest version of the Sculptor Generator is based on XTend 2, which is compiled to Java/Bytescode rather than interpreted as XTend1 and XPand was. This should bring large performance improvements to the code generation cycle, and it certainly feels faster for my projects. Of course, since code generation is part of the development cycle, we’d always like the performance to be better. In order to improve the performance, we first need to know what the bottlenecks are, which is where a profiler comes in; specifically I’ll describe using YourKit to profile code generation for one of the Sculptor sample projects.

The first step is to start the YourKit profiler. YourKit will start up with the welcome view, and will show any running Java processes, ready to attach to one of them.

yourkitWelcome

Now we need to execute the Sculptor generator, first attaching it to the YourKit process. Sculptor code generation is typically executed as part of a Maven build, via the Sculptor Maven plugin. Since Maven isn’t a long-running process, and we want to make sure to profile all of the code generation cycle, the best way to attach Sculptor to Maven is to do it at Maven startup via JVM command line arguments. Specifically -agentpath to attach the process to YourKit and enable profiling, and YourKit startup options that can be used to enable different types of profiling, taking snapshots, etc.

To pass these arguments to Maven, we can use the MAVEN_OPTS environment variable. I already had some JVM arguments to set the maximum memory. So on my Mac, I ended up with:

#!bash
export MAVEN_OPTS='-agentpath:/Applications/YourKit_Java_Profiler_2013_build_13046.app/bin/mac/libyjpagent.jnilib=tracing,onexit=snapshot -Xmx1424m -XX:MaxPermSize=1024m'

The above will enable the tracing profiling method (vs sampling), and instruct YourKit to record a snapshot that may later be inspected on process exit.

You can control how YourKit performs tracing via Settings -> CPU Tracing… The only tracing setting I changed was to disable adaptive tracing, which omits some small frequently called methods from profiling. This lessens the profiling overhead, but I’m not really concerned about that and want to make sure I’m getting complete results.

yourKitTracingSettings

Now that the options are set up, run Maven in the Sculptor project to be profiled. In my case, the library-example project:

#!bash
mvn sculptor:generate -Dsculptor.generator.force=true

Once it’s done executing, we can open the previously recorded snapshot via File->Open Snapshot.., and look at the different reports and views. This is what the call tree view looks like:

yourKitCallTree

These results are fine, but the trace results are cluttered with many methods we’re not interested in, since the entire Maven execution has been traced. The best option I found to only trace those methods we’re interested in was to initially disable tracing, then use a YourKit Trigger to enable tracing on entry and exit of the main Sculptor generation method, org.sculptor.generator.SculptorGeneratorRunner.doRun.

In YourKit, you can add a trigger via the “Trigger action on event” button.

triggerActionOnEventSelection

The problem is this button seems to only be enabled if YourKit is actively profiling an application, and since the Maven execution isn’t a long-running process, you can’t configure it in time. The solution I used was to start Maven suspended in debug mode, configure the trigger, then kill Maven. Again, this can be done by adding some JVM arguments to MAVEN_OPTS, and running Maven again:

#!bash
export MAVEN_OPTS='-agentpath:/Applications/YourKit_Java_Profiler_2013_build_13046.app/bin/mac/libyjpagent.jnilib=tracing,onexit=snapshot -Xmx1424m -XX:MaxPermSize=1024m -Xrunjdwp:transport=dt_socket,address=8000,server=y,suspend=y'

Once YourKit is attached to the running Maven process, we can add the trigger:

triggerActionOnEventDialog

To be able to use this trigger each time Sculptor is executed via Maven, we have to export the trigger configuration into a file, then when running Maven, specify the trigger file via another YourKit argument. We can export the trigger via Popup menu->Export Triggers…

Following is the exported trigger configuration. The above steps are just a means to end up with this configuration, so you can skip them and simply copy the following into a triggers.txt file.

MethodListener methodPattern=org.sculptor.generator.SculptorGeneratorRunner\s:\sdoRun\s(\sString\s) instanceOf= fillThis=true fillParams=true fillReturnValue=true maxTriggerCount=-1
  onenter: StartCPUTracing
  onreturn: StopCPUProfiling
  onexception: StopCPUProfiling

To specify the trigger file that should be used, use the ‘triggers’ command line argument. Since tracing will now be enabled via the trigger, I also removed the ‘tracing’ argument so tracing wouldn’t be enabled on startup:

#!bash
export MAVEN_OPTS='-agentpath:/Applications/YourKit_Java_Profiler_2013_build_13046.app/bin/mac/libyjpagent.jnilib=triggers=triggers.txt,onexit=snapshot -Xmx1424m -XX:MaxPermSize=1024m'
Posted in Java, MDSD, Tools | Tagged , | Leave a comment

Working with Geospatial support in MongoDB: the basics

A project I’m working on requires storage of and queries on Geospatial data. I’m using MongoDB, which has good support for Geospatial data, at least good enough for my needs. This post walks through the basics of inserting and querying Geospatial data in MongoDB.

First off, I’m working with MongoDB 2.4.5, the latest. I initially tried this out using 2.2.3 and it wasn’t recognizing the 2dsphere index I set up, so I had to upgrade.

MongoDB supports storage of Geospatial types, represented as GeoJSON objects, specifically the Point, LineString, and Polygon types. I’m just going to work with Point objects here.

Once Geospatial data is stored in MongoDB, you can query for:

  • Inclusion: Whether locations are included in a polygon
  • Intersection: Whether locations intersect with a specified geometry
  • Proximity: Querying for points nearest other points

You have two options for indexing Geospatial data:

  • 2d : Calculations are done based on flat geometry
  • 2dsphere : Calculations are done based on spherical geometry

As you can imagine, 2dsphere is more accurate, especially for points that are further apart.

In my example, I’m using a 2dsphere index, and doing proximity queries.

First, create the collection that’ll hold a point. I’m planning to work this into the Sculptor code generator so I’m using the ‘port’ collection which is part of the ‘shipping’ example MongoDB-based project.

> db.createCollection("port") { "ok" : 1 }

Next, insert records into the collection including a GeoJSON type, point. According to MongoDB docs, in order to index the location data, it must be stored as GeoJSON types.

> db.port.insert( { name: "Boston", loc : { type : "Point", coordinates : [ 71.0603, 42.3583 ] } })
> db.port.insert( { name: "Chicago", loc : { type : "Point", coordinates : [ 87.6500, 41.8500 ] } })

> db.port.find()

{ "_id" : ObjectId("51e47b4588ecd4e8dedf7185"), "name" : "Boston", "loc" : { "type" : "Point", "coordinates" : [  71.0603,  42.3583 ] } }
{ "_id" : ObjectId("51e47ee688ecd4e8dedf7187"), "name" : "Chicago", "loc" : { "type" : "Point", "coordinates" : [  87.65,  41.85 ] } }

The coordinates above, as with all coordinates in MongoDB, are in longitude, latitude order.

Next, we create a 2dsphere index, which supports geolocation queries over spherical spaces.

> db.port.ensureIndex( { loc: "2dsphere" }) >

Once this is set up, we can issue location-based queries, in this case using the ‘geoNear’ command:

> db.runCommand( { geoNear: 'port', near: {type: "Point", coordinates: [87.9806, 42.0883]}, spherical: true, maxDistance: 40000})

{
    "ns" : "Shipping-test.port",
    "results" : [
        {
            "dis" : 38110.32969523317,
            "obj" : {
                "_id" : ObjectId("51e47ee688ecd4e8dedf7187"),
                "name" : "Chicago",
                "loc" : {
                    "type" : "Point",
                    "coordinates" : [
                        87.65,
                        41.85
                    ]
                }
            }
        }
    ],
    "stats" : {
        "time" : 1,
        "nscanned" : 1,
        "avgDistance" : 38110.32969523317,
        "maxDistance" : 38110.32969523317
    },
    "ok" : 1
}

For some reason, a similar query using ‘find’ and the ‘near’ operator, which should work, doesn’t:

> db.port.find( { "port" : { $near : { $geometry : { type : "Point", coordinates: [87.9806, 42.0883] } }, $maxDistance: 40000 } } )

error: {
"$err" : "can't find any special indices: 2d (needs index), 2dsphere (needs index),  for: { port: { $near: { $geometry: { type: \"Point\", coordinates: [ 87.9806, 42.0883 ] } }, $maxDistance: 40000.0 } }",
"code" : 13038
}
Posted in General, MDSD | Tagged , | Comments Off on Working with Geospatial support in MongoDB: the basics

Easy Grails Hosting: Cloud Foundry

cloud_horse

These are some (old) notes on my experience with looking for an easy Grails hosting solution. This is a continuation of this post where I explored Heroku for Grails hosting.

 

Wikipedia defines Cloud Foundry as

“Cloud Foundry is an open source cloud computing platform as a service (PaaS) software developed by VMware released under the terms of the Apache License 2.0. It is primarily written in Ruby. The source and development community for this software is available at cloudfoundry.org”

Cloud Foundry is also a hosted service provided by VMWare, the principal company behind the Cloud Foundry platform. In addition to VMWare, several companies provide hosting services.

How does Cloud Foundry stack up against my original requirements?

  • Free or cheap to get started

    • appfog offers unlimited apps with 2 GB RAM and 100 MB storage, using their sub-domain
    • Cloudfoundry.com is currently free. Actually, they’re in a sort of beta mode, and there is no information on a non-free account. Currently the limits are 2 GB of memory and 2 GB of storage.
  • No vendor lock-in. If I want to move to another provider, no code changes necessary

    The Cloud Foundry platform is open source (Apache License 2.0), and there are multiple providers as I previously noted, so check.

  • Ability to scale up the number of instances and amount of memory easily

    Instances can be scaled up on-demand via their VMC command line utility.

  • Minimal effort to set up a Grails application

    Cloud Foundry has excellent Grails support. Not surprising considering VMWare is behing both Grails and Cloud Foundry. The Cloud Foundry Grails plugin makes it easy to deploy, update, and overall manage your Cloud Foundry-based Grails application. This post is a good getting started guide.

  • Support for MySQL or PostgreSQL, and MongoDB

    All of these are supported by Cloud Foundry as services. The getting started guide lists the available services (see left menu).

  • HTTPS support

    Cloud Foundry supports HTTPS out of the box, but it sounds like that terminates at their load balancer, so communication between the load balancer and your instance is unencrypted. Not a big deal for me, at least when starting out. Other providers like appfog may more fully support SSL.

Another interesting point on Cloud Foundry is that you can run your own instance of Cloud Foundry via Micro Cloud Foundry and VMWare. Looks like it’s also possible to run it on VirtualBox

I deployed my Grails 1.4 based app to cloudfoundry.com, using the MySQL and MongoDB services. It pretty much worked as advertised and was a breeze to get started with. I had to increase my instance’s memory limit from the default 512MB, but that was easy to do via VMC.

I also ran into this problem with Grails and Spring Security on Cloud Foundry. The solution of adding the following to BuildConfig.groovy worked for me.

compile ":webxml:1.4.1"

I was running with Grails 1.4, so the latest Grails 2.* may not have this problem.

Posted in Grails | Tagged | Leave a comment