Add new instances to your Jackrabbit cluster – the non-time-consuming way

And here's another blog post about jackrabbit clusters and how to make your life better.

Adding a new instance to a Jackrabbit cluster is very easy. In the beginning. Just provide a proper repository.xml which points to the central sources, add a new cluster id and start. Everything is taking care of from then. The problems start, if you're data grows and gets larger and larger.

If you add a new instance to a Jackrabbit cluster, your new Jackrabbit instance starts up and begins to read all the content to reindex and built up its Lucene search index. It also reads the Journal and rebuilds everything it needs from there. You can imagine, that this can take quite some time if you have a lot of content build up lately.

Furthermore, as this the journal can get huge pretty fast, Jackrabbit introduced a janitor, which cleans the journal log daily. Great, if you have the same instances running all the time (they don't need the log from days/months ago), but not so great, if you want to add new instances (and the wiki entry linked above warns you from exactly this).

But there's a solution to this very problem, and it's not that complicated:

Shutdown one of your instances
Get the current revision number that instance was from your database
Copy your whole Jackrabbit repository directory to another server/location
Start your original Jackrabbit again
Change repository.xml with a new nodename in your clusterconfig
Add that nodename to your DB in JOURNAL_LOCAL_REVISIONS with the number from the original instance
Start your new Jackrabbit instance (or keep it for backup purposes)

With this approach, we can be sure that everything is in a consistent state (the Lucene indexes for example) and we can safely start that copy of this instance in another place and it should take up where it was without loosing anything (as long as the janitor didn't run between the backup and starting the new clone).

As a little proof of concept I wrote 2 little scripts, which exactly do what I described above. They can be found on Github at github.com/chregu/Jackrabbit-clone-scripts. They are not used in production (yet) and handle one specific setup (we use MySQL as Persistent Store for example), but it should be easy to adjust it to your needs. It has some tests for avoiding mistakes and the scripts stops then, but I'm sure I missed some not-so-obvious ones. It will help us a lot in adding new instances to a cluster in a decent amount of time. I'm sure some of you out there can make use of it, too (be it only to know how that works in Jackrabbit). The README has some more info.

Do you have a question, a comment, or just feeling inspired? Mention us or share this article on LinkedIn.

Subscribe to blog updates using the RSS Feed.

Topics

Technology