Moving Confluence from Windows to (Ubuntu) GNU/Linux

Recently, my company’s corporate wiki started to sputter. It isn’t that Atlassian Confluence itself is a dog, but when you load it with over 100K+ pages, hundreds of simultaneous users, many GBs of attachments in the DB, and try to run it on Windows 2003 Std and SQL Server 2000, things get rather dicey. (Add in the fact that the same hardware is used for a half dozen other production apps, including our enterprise JIRA install, and you have a recipe for failure.) So, for various reasons, we decided to move Confluence to its own dedicated environment: Ubuntu 8.04 with a MySQL backend.

I don’t intend to make this a detailed HOWTO — I can’t imagine that any two migrations will ever be identical. Instead, I’ll point out some of the gotchas I ran into along the way and what the resolution ended up being.

Installing Confluence on Ubuntu (non-standalone version)

I simply can’t top Bing Zou’s post from Sept 2007, so I’ll link to it: Install Confluence on Ubuntu Server Virtual Machine. If you follow these instructions to the letter, you’ll be up and running in no time. Of course, the virtual machine part is optional — skip to Step 4 if this doesn’t apply to you. And also note that sun-java6-jre is now available and offers significant performance boosts.

A quick word on the Confluence Installation and Confluence Home directories: if you’re new to Linux or Unix, note that certain types of files go in specific places by convention. (Much like the convention of the Program Files or system32 directories in Windows.) While you could put Confluence’s installation and home directories anywhere you personally like, /usr/share/confluence-2.x.x and /srv/confluence/data are good choices because that’s where most other people would expect to find them. If you’re at all interested, there is an exhaustive article on the Filesystem Heirarchy Standard (FHS) written by the group that maintains it.

So, now you have a fresh install of Confluence up and running, which works great for you if the content you need to migrate is less than a few GB and can be exported to an XML file via the Confluence backup feature; you can just Migrate to Another Database. But if you’re unfortunate enough to be in the same boat as me, you have many times that much content. Which leads to the first migration obstacle …

Migrating More than 2GB of Confluence Data

Hands-down, the best way to do this is the MySQL Migration Toolkit. Basically, this is an ETL-ish-type tool that allowed me to move my Confluence DB from SQL Server 2000 to MySQL 5.0 in just a few hours.

Atlassian posted a decent article on this topic: Migrating from HSQLDB to MySQL. No matter the RDBMS you’re migrating from, the process is probably about the same. But here are a few tidbits to help you out:

  • You *really* want to have the toolkit installed on the same physical machine as SQL Server. Installing the toolkit on a workstation acting as a “slingshot” to send the data over will take forever and increase the likelihood that Bad Things will happen. The amount of data is the reason you opted for a DB migration in the first place, right?
  • On the Object Type Mapping step, don’t forget to click Show Details on both sections. For Migration Method for Type Schema, choose Multilanguage. For Migration Method for Type Table, choose Data Consistancy/Multilanguage. Any other combination of settings caused failures for me — your mileage may vary.
  • You can also select Enable Detailed Mappings in Next Step under the Advanced options. This allows you to specify the name of the DB you already created when you first installed Confluence on your Ubuntu machine. But if you do this, think carefully about what you choose to do with the user data — you don’t have to migrate it. I simply let the toolkit create a new DB for me since we use LDAP for our user data and pointing the fresh Confluence install at the new DB is easy enough.
  • One reason your site might be so large is that you keep attachments in the DB. If this is the case, you almost certainly have BLOBs >4MB in size. In this case, don’t forget to enable Ignore BLOBs >4MB or the Stream BLOBs options under the Advanced section of the Data Mapping Options screen. Before I knew about this, the migration would hang and the entire Ubuntu box would require a hard reset. Zoinks.

Moving the Confluence Data Directory

Just as it sounds, this is straight forward. A good article from the Atlassian folks on this process: Migrating Confluence Between Servers. However, as you may have guessed, here are a few pointers:

  • When you move the directory from Windows to Linux, the permissions are likely to come totally unhinged (depending on the method you use.) This is pretty easy to fix, though. Just use the chown command to give the tomcat55 user ownership of the entire directory structure. (Bing Zou showed you how to do this earlier …)
  • REMEMBER THAT confluence.cfg.xml CONTAINS DB CONNECTION INFO! I stress this because if you happen to start Confluence with this file as-is and the Linux server has a route to your OLD database server (in live production use), you will render your production wiki inoperable because Hibernate will get wholly confused and think it’s in a cluster. (You’ll need to restart your production Confluence instance if this happens … speaking from experience here …)
  • Make a backup of your confluence.cfg.xml file before you overwrite it with the data copied from the Windows box — the MySQL driver references and connection syntax in it come in handy when you need to change the old config file to point at its new MySQL database. Renaming it to confluence.cfg.sql2k.xml worked quite nicely for me.

Recently Updated Macro Not Displaying

This one was strange because it only appeared in a single test run and then in our final Linux-based build out. Basically, the right side of the Confluence Dashboard was missing the Recently Updated content. A quick inspection of the logs reveals the following error:

Caused by: java.lang.IllegalArgumentException:
  The datetime zone id is not recognised: SystemV/EST5EDT
    at org.joda.time.DateTimeZone.forTimeZone(DateTimeZone.java:310) [...]

This is a known defect in the Sun JRE on Debian and Ubuntu. More detail in Sun’s Bug Report.

There are two ways to fix this: recreate a broken symlink at /etc/localtime, or append a timezone assignment to JAVA_OPTS. I elected to use the second method as it is permanent.

In /etc/init.d/tomcat5.5, add this timezone value to the very end of the JAVA_OPTS line:

JAVA_OPTS="$JAVA_OPTS -Djava.endorsed.dirs=$CATALINA_HOME/common/endorsed
  -Dcatalina.base=$CATALINA_BASE -Dcatalina.home=$CATALINA_HOME
  -Djava.io.tmpdir=$CATALINA_BASE/temp
  -Duser.timezone=America/New_York"

Summary

The migration is still onging for us — we plan to launch the new config sometime in the next week or so. In the meantime, I’ll update this post with anything else that seems like it might be useful.

If you’re charged with doing something like this, feel free to drop me a line … I’ll gladly save you a few headaches if I can.

UPDATE: So, it was decided today to punt on the move to a Linux platform. This decision has nothing to do with technology; it’s all about the peeps, man. Our IT department is a Windows-only shop, and as such, we opted to protect their right to not have to learn anything new. (After all, they prefer Windows, mind you.) I offered to introduce them to something called a “book” to ease the burden of an expanding set of skills, but to no avail. :)
 
But none of this is lost — I plan to post a detailed HOW-TO on moving Confluence from Windows to Linux in the near future. I might even go with Ubuntu 8.10. And I’ll make it so easy, an IT guy could do it.
 
UPDATE TO THE UPDATE: After years of performance problems on Windows and Confluence 2.X.X, we finally migrated to GNU/Linux and Confluence 3.2.1_01. But it wasn’t easy. More info at HOW-TO: Fix your Atlassian Confluence database schema.

  • Share/Bookmark
This entry was posted in Geeky Stuff and tagged , , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

3 Comments

  1. Posted October 21, 2008 at 10:36 PM | Permalink

    Hallo Ricky, Great blog post! It will be interesting to hear the outcome of the migration. I have put a link to your blog post on the Confluence docs. I hope that’s OK with you. If not, let me know and I’ll remove the link.
    Cheers, Sarah

  2. mike
    Posted March 11, 2009 at 1:20 PM | Permalink

    I am certain that if anyone knew just how convoluted administering confluence is, nobody would ever buy this product.

    All xml database backups over 2 GB are worthless. Good luck in getting Atlassian to help you with it no matter how much money you spend on the program.

  3. Posted February 3, 2010 at 2:15 PM | Permalink

    One important point to note when moving platforms like this is it is best to blow away the contents of the PLUGINDATA. It contains binary data and sometimes moving between platforms can mess this up.

    You will need to reinstall your plugins after doing this, so print a list of them from this table.

    regard,

    -Rob

    Atlassian Professional Services
    http://www.customware.net

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

  • Your Ad Here
  • Join the FSF