HowTo Setup Nexenta OS as a HA-NAS

This goes into detail on how to setup a High Available NAS appliance with Nexenta Core Platform (the free version of the various Nexenta OS). Feel free to edit this document as you go along. I chose Nexenta over OpenSolaris because Nexenta had the driver for my LSI 9200-16e SAS HBA adapter. I've been surprised at the quality of even the beta versions.

My setup:

  • 2 Quad Core Intel Xeon's with 24GB RAM in 2 SuperMicro servers with dual gigabit and dedicated IPMI port (important!).
  • 2 Xtore XJ-SA24-316R 3U 16X 3.5 Bay Dual Expander Scalable SAS - always choose a redundant expander option
  • 25 2TB SAS hard drives (Constellation ES ST32000444SS)
  • 2 160GB Intel X-25-M
  • 2 32GB Intel X-25-E
  • 4 Western Digital Caviar Black 7200RPM SATA hard drives (for the servers to boot off)
  • Cisco 3560 Gigabit switch
  • Serial link (RS232) between the servers - the serial cable has to be a null-modem cable (important!)
  • 6 SFF-8088 to SFF-8088 cables (they are pricey). I used 1 meter cables but yours might have to be longer depending on your configuration.

Setup the hardware

I recommend rack mount. Those things make a lot of noise so it's not for datacenter-in-office environments. If you do have a hardware RAID adapter for the boot drives, I would not use it so don't spend much money on it - ZFS has a much better way of handling errors than most of those adapters do.

Install Nexenta Core Platform

Download the latest NexentaCP ISO and burn it to a disc. Then proceed going through the installation, no gimmicks here. Select the two internal hard drives to install the OS on. Nexenta automatically configures them as a ZFS mirror (nice!). Make sure you don't forget your passwords!

Give your two machines a different name (I used nfs-1 and nfs-2) in the same domain and same IP range. I really recommend a fixed IP address for all addresses related to the machine (including IPMI/Remote management).

Initially login to both computers. Set the date using ntpdate time.nist.gov or use your own timeserver. Make sure NTP has started: svcs ntp will show you its status. svcadm enable ntp will start it. Check for SSH to be running as well as other services you like to have. Use ssh-keygen -t dsa to generate keys and put them in both machines ~/.ssh/authorizedkeys - that way you don't have to enter passwords to get to the other machine. In /etc/ssh/sshdconfig (using vi or pico) set PermitRootLogin yes.

Connecting the external hard drives

The Xtore disk array and the LSI card will just give you its connected hard drives raw. This is good for ZFS but not good for management (what if a disk goes missing - you should know which one to pull). You really should take out all hard drives and insert them one by one. If you do tail -f /var/adm/messages and then push a hard drive in, a couple of microseconds later you'll get a message like this on the console:

Jun  7 05:33:53 nfs-1 genunix: [ID 408114 kern.info] /scsi_vhci/disk@g50015179591dabe1 (sd26) online
Jun  7 05:33:53 nfs-1 genunix: [ID 483743 kern.info] /scsi_vhci/disk@g50015179591dabe1 (sd26) multipath status: degraded: path 8 mpt_sas2/disk@w50015179591dabe1,0 is online

The 50015179591dabe1 part of the disk is the disk WWN (World-Wide Name) or SAS address of the drive. It should be globally unique in your SAN. Write down the WWN and what slot you have put it in. Hint: to get a quick overview of all WWN's, push in all the drives, type "format" and copy paste all the WWN's. Then remove all drives again and push them in one by one, then you just have to find and write down the slot numbers. Make sure you make multiple copies of that document once finished. Nexenta will present your disks as c0t

Depending on the number of drives you have/want and the space you want you might have a different setup than me. I recommend 8-12 drives per ZFS RAIDZ2, 5-8 drives per RAIDZ1. You can also do mirrors, then bundle your mirrors per 2 drives. There are various recipes to be found (look for a SUN Thumper setup for a good overview). Never do JBOD (no redundancy). You will lose a drive (we had 1 DOA, 1 was having serious issues (read/write errors) and 1 just died in a 3 month period) and without redundancy you lose your data. RAIDZ2 is the best option if you want the least space wasted but still decent security but the performance is lacking (as is RAID5/RAID6) compared to mirrors.

Here is my zpool setup:

zpool create Volumes raidz2 c0t5000C50020C6188Fd0 c0t5000C50020C6131Bd0 c0t5000C50020C7B77Bd0 c0t5000C5001046C077d0 c0t5000C500104A350Bd0 c0t5000C500104A9A6Fd0 c0t5000C50010466FB3d0 c0t5000C500104A3D47d0

zpool add Volumes raidz2 c0t5000C500104A1367d0 c0t5000C500104A615Bd0 c0t5000C50020C61433d0 c0t5000C500104A447Fd0 c0t5000C5001049F927d0 c0t5000C500104A1523d0 c0t5000C50020C61447d0 c0t5000C5001046D813d0

zpool add Volumes raidz2 c0t5000C500104A9213d0 c0t5000C50020C7B71Bd0 c0t5000C5001043B3EBd0 c0t5000C50020C60C37d0 c0t5000C500104A5D33d0 c0t5000C500104371C3d0 c0t5000C500103F3423d0 c0t5000C50010445AD7d0

# These are my 32GB Intel X-25E's (mirrored)
zpool add Volumes log mirror c0t50015179591D9F4Ad0 c0t50015179591DABE1d0
# These are my 160GB Intel X-25M's (not mirrored, not necessary)
zpool add Volumes cache c0t50015179591D9F4Ad0 c0t50015179591DABE1d0
# This is my spare
zpool add Volumes spare c0t5000C500104A1FABd0

As you can see, I have mirrored the write cache - if your power fails and you have writes pending and the power comes back on with a dead log drive, you will lose those writes. I have not mirrored the read cache - caches can fail without a problem, ZFS just reverts to using the disks. Add spares as much as you have them.

Making the setup redundant

First and foremost we should prevent both systems from mounting the same system at the same time (it WILL destroy your zpool). We can do this by using cachefiles. Then export the pool.

zpool set cachefile=/var/tmp/VolumesCachefile Volumes
zpool export Volumes

On the other host copy the cachefile over from the primary host using scp (see how SSH keys save you time here?)

scp nfs-1:/var/tmp/VolumesCachefile /var/tmp/VolumesCachefile
zpool import -c /var/tmp/VolumesCachefile -o cachefile=/var/tmp/VolumeCachefile Volumes

You might have to specify -f since the host header in the pool is different. Make sure you export the pool on the other system first though!!!

Automatic HA failover

Since heartbeat doesn't compile on Nexenta and getting Sun Cluster to work is also a problem I wrote my own scripts to do the heartbeat and failover. You can download it and follow the instructions from here

If you do use my scripts you will require IPMItool which requires GCC (apt-get install gcc) and might require a couple of libraries which are all available through apt-get to compile. I also installed Webmin from source.

Alternatively write your own scripts. What you always have to have in mind: Shared volumes should not be mounted on two hosts at the same time - make sure you kill the other host first (even if you have to manually pull the plug).

NFS & SMB & AFP

NFS file serving is easy to get: zfs set sharenfs=sec=sys,root=host1.yourdomain.net:host2.yourdomain.net,rw=@128.151.188.0/24. When the other host imports the zpool, it will correctly share the NFS system.

SMB requires you to install Samba. apt-get install samba. Then configure your shares manually using Webmin or SWAT on both hosts. DO NOT USE zfs set sharesmb because it won't work well and apparently kernel panics my machines.

AFP requires Netatalk - you need to compile Netatalk from source as the apt-repository one doesn't authenticate. You will also need the Sun openssl packages and libdb (Berkeley DB) apt-get install sunwopenssl-include sunwopenssl-libraries libdb-dev.

Bonjour advertisements is built-in using dns/multicast. See here: how to enable Bonjour advertisements

Also available in: HTML TXT