Slacker Elves in the Amazon Cloud

One of my current projects has been building a framework for automatically provisioning and deploying servers into the Amazon EC2 cloud. So, I have scripts that ask Amazon to start an EC2 instance, wait for Amazon to tell me that they are 'running', and then try to SSH into the instances to do things to them.

One vexing issue I ran into is that often the ssh connection would fail while running the script, yet I would have no problem ssh'ing manually afterwards. I spent some time googling the problem and found some people having similar mysterious failures, but no concrete solutions.

The failure mode for me during script execution looked exactly like what you see when you've screwed up your keys somehow: running with -v, you can see it tries public-key, has no luck, and moves on to ask for a password. That definitely is not going to work because a) it's not interactive and b) there is no root password on a fresh EC2 instance.

So, what is the problem? Well, I have not really figured it out, but I have come to realize that there is a period of time after the Amazon says the instance is 'running' during which it will still reject all SSL connections. The period of time seems to vary between 30 and 60 seconds, and seems to increase during what I would assume are peak usage hours.

I speculate that somewhere in the cloud is a little elf that runs around distributing Amazon's half of the keypairs to all newly-started EC2 instances. The lag in SSL connectivity is due to the elf getting tired and/or busy, and the monitoring machinery is not designed to wait for him to finish before it declares that an instance is 'running'. I've not been able to confirm that this is the case, but I'm sure Occam would favor explanations involving lazy elves.

A crude way around the problem is to simply sleep for a minute or so before trying to establish a connection. A slightly less crude way is to repeatedly attempt to SSH in after the instance starts.

lodi 0.1.0 released

Along with SQLsheet, I'm open sourcing another library that I've written: lodi (Local Dispatch).

A lot of my work lately has relied on generating XML using templates (I'm using Velocity). The XML is mostly used to either drive an ETL process, specify some XHTML to get rendered, or to configure some internal subsystem.

It gets complicated, though:

  • Sometimes that XML gets consumed outside the VM via HTTP, and sometimes it gets consumed inside the VM.
  • In either case, I need to be able to embed template parameters in a URL (i.e., query stirng).
  • I want a uniform method for addressing the templates (inside or outside the VM)
  • I want a uniform method for processing and organizing the templates in the webapp
  • Some of the templates I want locked down from the outside - I don't want them served out over HTTP
  • ...except that sometimes I do. For debugging, it turns out to be really handy to be able to peek at template instantiations in my browser.

My solution was to create a backdoor URL dispatcher for internal clients. So, from outside I can access a template as

http://localhost/foo/bar/baz.vm?name=roger

whereas from inside the VM (and only inside the VM) I can access it as

myapp:/foo/bar/baz.vm?name=roger

One nice thing about this is that the myapp URL will bypass the container security. This means that I can be free of having to authenticate when calling internally without having to do unnatural things to my web.xml security configuration to lock it down from outside.

Another nice benefit is that the servlet request gets processed in the caller's thread. This lets me propagate any Exceptions encountered while directly back to the caller, which makes debugging and error handling a lot simpler when there are problems in a template.

(more...)

SqlSheet 0.1.0 released

There are a number of JDBC-Excel drivers out there, but most seem to be nearly abandoned. Moreover, they all either use native pieces, are read-only, or do disconcerting things with in-memory databases, none of which work for me.

So, what is a self-respecting hacker to do besides write their own? Here's the nutshell:

SQLSheet is a JDBC driver which allows you to interact with Microsoft Excel spreadsheets using SQL statements.

Features:

  • Pure Java (no native components)
  • Fast (operates directly on the spreadsheet, does not rely on an in-memory database)
  • Read and write operations
  • PreparedStatement support

Now, if you just want to manipulate spreadsheets, just use Apache's POI - it's great. Why is SqlSheet useful, then? Mainly for manipulating spreadsheets inside other tools that are database-oriented. You can just give them a jdbc:xls URL and start doing whatever you normally do. In my case, I specifically wanted to be able to manipulate spreadsheets from inside the Scriptella ETL framework.

(more...)

Switched to Wordpress

While I can't think of anything much more boring than blogging about blogging, I will do it just briefly because here there might be a few useful bits of information here that are useful...

I just switched from MovableType 4 to WordPress and am overall very pleased. It's faster, cleaner, simpler, and seems to have a stronger community. It's PHP based, which I much prefer to MT's weird template tags. A couple of pitfalls:

If you are like me and like to write your entries in raw HTML, there are a couple of things you need to do:

  • Go to 'Users', click 'Edit' on your username, and uncheck 'Use the visual editor when writing'
  • Download the Raw HTML plugin.
  • Or, better yet, use my hacked version which is always on (i.e., doesn't require the magic 'raw' comment delimiters).

Also, don't mess around with symlinks or .htaccess - this seems to somehow cause problems with deleting comments and posts. If you come from MT, it's tempting to do this because WordPress wants to live under your document root, whereas MT publishes static pages into a separate directory. My advice is to just put WordPress in your public_html dir where it wants to be - it feels weird but it's ok.

Finally, I highly recommend downloading the Persistent Styles Plugin so that you can non-destructively tweak wordpress templates.

PropJoe

I have long had this irritating problem with .properties. It goes something like this:

  • I like to use .properties files for configuration unless there is a compelling reason to use something more complicated.
  • I'm pretty uptight about using constants in my code. This includes using constants to refer to the names of properties in a .properties file.
  • Over time, the constants and the .properties file can drift apart - you have to keep them both in sync.
  • Similarly, properly documenting them both becomes a painful

So, I proceeded to finally do something about it. I spent about 15 minutes writing the code and 45 minutes writing the documentation, go figure why. Maybe it's because The Wire is over and I think I just needed to do something to say goodbye. At any rate, I have found it has made my life just a little bit easier.

With that, I give you, PropJoe.

docs | download

Create a custom .EXE to launch Tomcat

In developing data integration applications for Windows users, it's really important to provide them with the most Windows-like experience as possible when it comes to installation, starting, and stopping the server. This has long been a painful topic for Java developers. (Don't even get me started about Java WebStart... :) )

(more...)

Scriptella 1.0 Beta Released

Fyodor Kupolov has released a 1.0 beta of his Scriptella ETL framework. I downloaded it and integrated it into my applications - so far so good.

I blogged about Scriptella a while ago. It's a simple and elegant framework for declaratively manipulating result-set style data (typically but not necessarily from a database). Fyodor also seems to have updated the reference section with some nice diagrams that explain the concepts very clearly.