Almost forgot – talks:

Hmm, I don’t know why forgot to post this, I just wasn’t caring that much about my blog recently ūüėČ

I’ve been at the libre software meeting – for the french speaking people “r√©union mondiale du logiciel libre” – in Geneva this year. I initially planned to only speak a little about illumos, then later also joined a second talk in when Walid Nouh (@wawax) asked me whether I’d be sharing some real-life experience with GLPI and FusionInventory, which I am using at my work since roughly 2008 or so. These talks have both targeted french-speaking people, and both have been recorded!

Check them out, I know this blog entry is a bit missing love, but nonetheless ūüôā

illumos

The illumos userland: http://video.rmll.info/videos/the-illumos-userland/

Additionally I’ve been happy to see – and afterwrds talk with – Max Bruning! (Joyent, member of the KVM porting to illumos)

FusionInventory (and GLPI

Walid has uploaded the slides:

September 8, 2012

Posted In: Uncategorized

Tags: , , , , ,

Leave a Comment

Short note: Time providers in virtual environments

Correct time in VMs is unfortunately has and will continue to be somewhat of an issue – some virtualization environments provide clock sources so you don’t have¬† to rely on ntpd (which fails if the drift gets too high). So for the short note – how can you check if you are using a time provider of your VM platform when running Linux on:

KVM (e.g. Proxmox VE, bare Debian/Ubuntu/RHEL…):

root@proxvm:~$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
kvm-clock tsc hpet acpi_pm 

root@proxvm:~$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
kvm-clock

Additional resources:

  • http://pve.proxmox.com/wiki/Guest_Time_drift
  • http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Host_Configuration_and_Guest_Installation_Guide/chap-Virtualization_Host_Configuration_and_Guest_Installation_Guide-KVM_guest_timing_management.html

Bonus question: Is a Linux guest also able to use kvm_clock when running inside a illumo-KVM on illumos, like on SmartOS? ūüôā


MS Hyper-V:

For what I know, the time provider is enabled per default also at Hyper-V level per VM, but you might want to disable this for specific guests like (Windows) domain controllers or Kerberos KDCs. (5′ time drift will also break Kerberos authentication). This will at least require at Kernel from later 2011 but better look for a Kernel with more stable Hyper-V driver integration, 3.4 will likely make you happy or 3.2 is also reasonably stable.

root@hvtux:~$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
hyperv_clocksource tsc acpi_pm jiffies

root@hvtux:~$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
hyperv_clocksource

When using the provider by the VE platform, I had less issues with time drift (none to be correct).

May 1, 2012

Posted In: Uncategorized

2 Comments

Managing certificate hell – the GLPI way

I’ve been a irregular user and admin of a GLPI server at work – GLPI is an open source IT asset management software that enables you to automate lots of asset management work¬†(also thanks to FusionInventory and its agents) while providing a UI that also non-Admins can work with – e.g. make your financial auditor.

While I really propone the use of HTTPS, managing the certificates is still a pain. I used to be guy deploying and managing the SSL certificates but I realized that things would go south if I was away or I had to give that to a collegue (which was the case…). It had also happened that I forgot to renew certificates early enough. – The solution I found won’t fit a massive scale company but It does the job for our purpose:

I didn’t want to continue the “yet another spreadsheet” idea without any usefuly warning before certificate expiration. There came the certificate inventory plugin into play. Once activated this will give you a “Certificates” tab under Plugins. This way you can enter your cert data, expiraton dates, issuing CA and – your can link a computer against the cert so you know where the certificate is deployed:

On the summary page you can sort them by expiration date and know when you have to care for one of them. – yay!

How to enable and configure notifications

Since I was going to pass the duty over to a collegue I created a GLPI group “Certificate Managers” where I put us both in (Administration -> Groups). Next I enabled the Notifications (Setup -> Notifications -> Notifications):

Select them both and for the bulk operations on the bottom you can both enable them.

Next you need to edit both notifications (expired and expiring certificates) to add your people you want to notify (the group is called in german since most of the time I use the interface in this language):

You need also to enable the checker for certificates on the automated actions – I use the CLI method (needs to run glpi/front/cron.php via a cron)

The next time once of your certificates expires, you will get notified enough in advance and you can always check when your certificates are going to expires.

P.S. The current 1.7.0 for GLPI 0.80 has some bugs in the english and german translation, this is fixed in trunk and is most likely to appear in the version for GLPI 0.83.

February 19, 2012

Posted In: Uncategorized

Leave a Comment

Taking gear out off service

This is a story about good practise and “don’ts” in sysadmin work,¬† inspired by Bryan’s (@bdha) blog entry about the¬† first law of systems administration. Recently I¬† got some used gear for lab¬† – the story of this gear, a set of FC switches – remembered me of a lesson about what can go wrong and why I was once told to do things a certain way when administering systems in production. – I was also bitten by similar but never that hurting experiences thouch

Expect the unexpected (say hello to Murphy)

Once in the ol’ days my current boss (and former sysadmin) told us: “If you migrate a service and power off the old server or whatever important gear: “Do NOT disassemble, throw/give away the old thing too soon. One never knows if you forgot a tiny detail and you might be happy to have the old box back running in short period.”

Today we have virtualization, conf management like and revision control who can help you track things for config changes, migrations. Nonetheless it can be very dangerous.

What he also told me was: “Plan for the unexpected, you really never know what’s going to happen in a migration if the system is complex. Also plan your time and resources and don’t squeeze too much into that time window.” (you might need a little bit of sleep though?)

Reality…

Now came day X where the customer (me) wanted to get his used gear. I was kindly asked fo an additional week due to their migration, fine. When I finally went there I had to realize that they were actually in the process of migrating the SAN traffic to the remaining fabric switches. And guess what? – Boom that’s when the “unexpected situation”¬† happened even with highly available virtualization and clustering and a multipathed, multi-controller FC SAN. I had to come a back in a couple of hours for the time they fixed their production environment. I was only grumpy becase I had to wait additional time, but they were quite exhausted after this exercise…

Conclusion

Doing massive changes on your critical production environment¬† without any time reserve is not sane. I may sometimes try to rush into changes but not this way. Now I have had the chance to experience such a situation as an outsider. Lots of Sysadmin wisdom gets outdated quickly, but some doesn’t. – This rule seems to be part of this wisdom. Stick to it, you not only hurt yourself by not following it.

January 30, 2012

Posted In: Uncategorized

Leave a Comment

Experimental Hyper-V branch

Here is a experimental branch of the mainline kernel with some backported patches that will be coming from stagin-next, hid and the network area where some of the drivers Hyper-V have left staging as of December 2011. So with the release of 3.2 sooner or later, you can fetch a kernel that has:

  • SCSI hot plug / remove
  • netvsc network promiscuous mode
  • netvsc and hid out of staging
  • A couple of bugfixes

Get my branch on gitorious.org

I plan to kind’a backport most of the later patches until 3.3 is finished as they don’t break building – I’m quite positive with 3.4 most of Hyper-V code will have left staging and reached a maturity for other distributions to enable drivers in default kernel builds. If you’re a admin that has to manage Linux on Hyper-V – other than SLES/RHEL or CentOS – now you should be asking your distributions to enable these modules (at least hyper-v bus with 3.2)

December 12, 2011

Posted In: Uncategorized

One Comment

OpenStorage: An aproach to a definition?

NexentaStor is a OpenSolaris (not yet) illumos-based storage appliance that I use at work. Nexenta Systems builds a ZFS storage appliance OS that can be installed on whiteboxes that may also have been validated previously to run it. They sell their appliance software as ‘Open Storage appliance’. So is OpenStorage = OSS? Actually I prefer Open Source over proprietary software, I try to develop a possible definition out my experience I had until yet with NexentaStor:

Why NexentaStor?

Back at that time Oracle had just stopped selling OpenSolaris support and the outcome of OpenSolaris was too uncertain for a new production deployment. The features were appealing, ZFS and COMSTAR offered the stability and functionality I wanted to have and the Solaris kernel provided unix stability. Linux didn’t offer such thing as well integrated – BTRFS ist still not considered stable enough for production use today (maybe SuSE and Oracle will jump in soon?), LIO target in Linux now has stable iSCSI target support¬† (wasn’t true back in early 2010) and FC support is still yet complete to my knowledge.

What about the license?

But if you look at the license of the free as in beer NexentaStor Community Edition?¬† – No. Their Enterprise Edition? No, both are traditional End User Licensing Agreements that don’t contain access to source code of the product – only to the parts of the NexentaOS that require them to make source available.

Originally NexentaCore was understood an promoted as the building rock for NexentaStor, the commercial operating system. This distribution was not only freely available but source packages were made available as well as (most of?) the patches Nexenta applied on top of the OpenSolaris source release they used to build NCP. Some of them were not part of OpenSolaris, most of them consisted of backported patches from later bi-weekly code drops from Sun. Curently after not really maintaining or keeping in sync NexentaCore with NexentaStor OS base it’s planned to deprecate and stop the availabiliy of this GNU-like OpenSolaris OS.

Open core

NexentaStor could be considered being NexentaCore that has a value-added storage management system on top. Nexenta decided to keep that development effort in their house and to not open source this. Thus the current NexentaStore¬† product as sold is ‘open core’ because “specific functionality (are blocked) from being in [the] open source ‚Äúcore‚ÄĚ.* As such I agree that open core doesn’t differ from traditional commercial closed source software in that terms.

So why not use a closed solution directly if OSS would not fit (yet)?
Actually the basic functions like CIFS, NFS and iSCSI are available in other open source distributions quite equivalently (excluding some tuning that Nexenta may have added). Paying for a friendly GUI, bundled hardware and performance monitoring looks worthy to mee – essentially all most of those monitoring and tracing be achieved from the console or by using the available code to extend it as did Nexenta.

What is being kept away from the open are enterprise features like HA, continuous replication etc. As long as their ZFS implementation the NFS, CIFS and iSCSI, FC (COMSTAR) implementation stay compatible to access thate exact data on the compatible ZFS with at least one other open source solution I’d consider the term Open Storage is not wrongly used.

Trying to find a definition:

Currently (I’d be happy to) understand the term ‘OpenStorage’ as coined and used by Nexenta as follows:

  • There is at least one other openly available operating system where you can import your (pool )data without the requirement for copying your data to new¬† filesystem
  • There is at least one other openly available operating system available where core functionalities to keep your data accessible (NFS, CIFS, iSCSI, FC…) without conversion (except some configuration settings that may be converted or re-worked)
  • Full feature parity or compatibility ouside of core functionalities to import data, access and provide data access is not required or guaranteeed¬† (HA features, continuous replication etc.)
  • Hardware compatibility to move off from NexentaStor to the open OS doesn’t necessarily mean entire hardware compatibility but if Nexenta would start to make their OS work¬† too much hardware that works with no open equivalent¬† with, the first point cannot be fullfilled.*

To my understanding currently OpenIndiana, StormOS and  to some limited extent SmartOS (not easily installable) and FreeBSD (no fully equivalent target mode) fullfill these requirements. The illumos project where Nexenta upstreams lots of their efforts shows they seem to be serious about staying compatible with the open source (Update: In fact Nexenta has also strenghtened their investment in the distro area where with illumian). So this unofficial definition would stay valid.

*Puppetlabs has an good definition of Open Core, see http://puppetlabs.com/puppet/faq/

October 22, 2011

Posted In: Uncategorized

Tags:

Leave a Comment

State of Linux on Hyper-V

Update 02.2012 – I got some feedback on this blog post and updated some of the content – but since lot’s of things have changed since then regard this as a snapshot of the back then in Autumn 2011.

I’m still interested in the progress the Hyper-V modules from Microsoft are doing in the mainline Linux kernel. I have continued to test them with each new kernel release¬† as well as following what patches go in. Consider this as a outside reflection at how the drivers evolve from a sysadmin’s perspective. Here are some of the current experiences I made and notable modifications that went upstream I think are worth being mentioned. Due to kernel.org being absent for a couple of weeks, lots of proposed updates have only been merged into the staging tree after kernel.org came back online.

Overall stability

Starting 2.6.39 the state of the drivers have largely improved from being really annoyingly unstable to ‘usable’ – I don’t warrant their stability for or critical use –¬† at least I’ve not seen a single crash on my small webserver running Ubuntu lucid due to these module with custom-built kernels since then. The only notable exception were the modules worked stable¬† were the modules in LIC 2.1 provided by Microsoft for RHEL5 and SLES10.

Since then Microsoft has published LIC 3.1 adding official support for RHEL 6 and CentOS 6 which seem to be based on the codebase in mainline 2.6.39 / 3.0 (when comparing the code) so MS themselves believe the stability is also acceptable for them to publish supported and pre-packaged binaries.

Goodbye /dev/hdX, wellcome /dev/sdX and > 2TB  LUN support

If you looked at your Linux virtual machine running on Hyper-V and were a distribution newer than RHEL5, SLES10 or a Kernel older than 2.6.21, you may have realized the following uncommon behaviour for Hyper-V VMs with hv modules: The IDE drives still showed up as /dev/hdX instead of /dev/sdX like almost everywhere else. Additionally – when in paravirtualized mode – the IDE controller was handled by ‘hv_blkvsc’ and the paravirtualized-only SCSI controller by ‘hv_storvsc’. This willl also remain with the release of Linux 3.1.

But with the merge-windows for 3.2 you can expect to see a bunch of changes already present in staging-next that (after a bunch of other patches) will move the handling of IDE drives to hv_storvsc and drops hv_blkvsc completely. Meanwhile a patch addresses the 2TB limitation for those who need it.

Recently before writing this blog  entry I have built a 3.1-rc4 kernel out of the staging-next branch and I can confirm the results for storvsc:

sim@sojus-1:~$ uname -a
Linux sojus-1 3.1.0-rc4-stagingnext+ #3 SMP Thu Oct 20 14:44:55 CEST 2011 x86_64 GNU/Linux

sim@sojus-1:~$ lsmod
Module                  Size  Used by
xt_multiport            1925  1
iptable_filter          1578  1
ip_tables              17884  1 iptable_filter
x_tables               24072  3 xt_multiport,iptable_filter,ip_tables
sit                     9942  0
tunnel4                 2925  1 sit
joydev                 10705  0
hv_mouse                4193  0
hid                    87780  1 hv_mouse
psmouse                61215  0
serio_raw               4720  0
hv_timesource           1111  0 [permanent]
i2c_piix4               8879  0
floppy                 63708  0
hv_storvsc              9152  3
hv_utils                5405  0
hv_netvsc              15931  0
hv_vmbus               27112  4 hv_mouse,hv_storvsc,hv_utils,hv_netvsc,[permanent]

As you can see: No more hv_blkvsc. But¬† I consider hv_storvsc’s behaviour of attaching IDE drives to be a bit weird: Normally the first disk on Linux would be named sda and partitions mounted as /dev/sdaX. This VM has 2 IDE and 1 SCSI disk attached so I’d guess we should have sda to sdc, right?

sim@sojus-1:~$ sudo fdisk -l

Disk /dev/sda: 1073 MB, 1073479680 bytes
[...]

Disk /dev/sdb: 10.7 GB, 10737377280 bytes
[...]

Disk /dev/sdc: 1073 MB, 1073741824 bytes
[...]

Disk /dev/sdd: 10.7 GB, 10737418240 bytes
[...]

Disk /dev/sde: 48.3 GB, 48318382080 bytes

The 2 IDE disks appear twice while the SCSI-attached disks appears once. Since I use LVM on all disks here  except for /boot,  mount shows /dev/mapper for the root partition etc. but here is how  /boot gets mounted:

administrator@sojus-1:~$ mount
[...]
/dev/sdc1 on /boot type ext4 (rw)

Interesting … albeit I¬† haven’t found a negative impact but still I think this has to be handled differently before storvsc leaves the staging directory.

Update: It seems one can ommit this having only /boot and GRUB on the bootable IDE disk, thanks Victor – he has more details how to omit this.
Actually it’s not a technical problem yet. MS handles this via modprobe rules in their Linux ICs for sanity.

Mouse support (not yet tested)

As some might remember the satori project¬† at Xen provided a semi-open inputvsc module that allowed i.e. RHEL5 to have mouse support as a Hyper-V guest but it was never part of the Microsoft-provided LIC 2.1, 3.1 nor in mainline linux kernel. Now original the Citrix-owned code was renamed to hv_mouse, expanded and merged into staging fully open source and expected to be functional with 3.2. As I don’t use Linux with X on servers, didn’t really care about that, nonetheless it’s greatto have fully open source driver for this component now. If you have tested hv_mouse I’d be happy to know about!

Update: I have since then tested the dirvers, 3.2 seems to work well but 3.3 contains some memory leak bugfixes others have had their issues with them in 3.0/3.1. If you dont have X.org on a Server VM, you can quietly ignore this module anyway. – I’ll let¬† Puppet¬† do the work or SSH to the box first ūüėČ

Starting 3.3 the driver has been renamed once as hid_hyperv and was moved out ot staging.

Partly leaving the staging area

Greg Kroah-Hartmann has put a patch in staging-next that moves the VMBus part out of drivers/staging/hv  Рread his commit message. It now means  the VMBus drivers Рthe paravirtualized bus of Hyper-V where paravirt devices connect to the hypervisor backend  Рare considered enough cleaned up to be considered stable.

Update: With Linux 3.3 only storvsc will be / is left in staging area. This means that the kernel hackers think that the overall quality of the kernel modules are believed to be good enough for daily use. For sure there are possible bugs and will be fixed hopefully as well.

Missing features

While I don’t have an urgent need for dynamic nemory on Linux guests rather than equal performance¬† to Windows guests this feature available to Windows guests starting Hyper-V 2008 R2 SP1 is still missing for Linux guests. I hope Microsoft considers making Linux a first class citizen on Hyper-V – not just good second class. – That said: BSD and other UNIX-based OS’es still lack the support Linux is yet having on Hyper-V.

Update (shortly after this blog post was made)

I have been keen enough to point K.Y on this article (he’s actually the guy doing the most of the work on this code at Microsoft) ¬†as he has once gave positive answer to an earlier feedback already. Let me write down what he answered me when asking about the current IDE behaviour:

We are committed to supporting Linux as a first class guest¬†on Hyper-V.¬†What you are currently¬† seeing is what is to be expected:¬†What is happening is that the IDE disks are being discovered by both the native IDE driver in Linux¬†(ata_piix) as well as the storvsc driver. The way we deal with this problem on RHEL6/6.1 and SLES is to have¬†a modprobe rule that ensures that hyperv driver loads first and if the load succeeds, the ata_piix driver will¬†not be loaded. This will result in the IDE disks being discovered only once. […]

Mouse driver is now functional and fully open sourced; I got it working a few weeks ago.

Later on:

The solution of using the modprobe rules is only the current solution. We are exploring other options to deal with this problem

This sounds great!

October 22, 2011

Posted In: Uncategorized

Tags:

One Comment

Nexenta / illumos as FC target: LUN mapping (3)

While you can also do quite some LUN mapping on the level of an FC switch I’d like to write here about what there are for possibilities on the level of the OpenSolaris / illumos STMF:

Last time I simply added a view to a LUN for everybody which isn’t a good idea if you are not using a cluster-aware filesystem (i.e. GFS, OCFS, or NTFS as a cluster-shared volume). Now we need to restrict the access to a LUN: For this STMF allows us to create host groups (hg) and target groups (tg).

Last time I have mapped a LUN to everyone, now I want to restrict the access to this LUN to a node called ‘akebono’ so let’s create a host group for all the HBAs installed in this akebono:

# stmfadm create-hg akebono
# stmfadm list-hg
Host Group: akebono

In the FC aread each HBA card has a unique WWNN (world wide node name) an since there are HBAs with more than one port, each port has its  WWPN (worldwide port name), this is a 64-bit value like the MAC address that every network controller has. stmfadm allows to add different types of names to a host or target group it IQNs (iSCSI qualified name) or WWNs (worldwide names are also used in SAS). We have to know the WWPN from the initiating HBAs. There are several ways to get the WWPNs (and possibly others):

  • Use vendor-provided tools for each platform (in this case this might be the Qlogic SANsurfer CLI)
  • (Linux/BSD/Unix) Lookup dmesg when the HBA driver gets loaded
  • Sometimes it’s written on a lable on the HBA

But if you have a small FC environment, then you can cheat a little:

# stmfadm list-target -v
Target: wwn.50060B0000655664
    Operational Status: Online
    Provider Name     : qlt
    Alias             : qlt3,0
    Protocol          : Fibre Channel
    Sessions          : 1
        Initiator: wwn.210000E08B9BE2DF
            Alias: -
            Logged in since: Sat Sep  3 02:15:56 2011
Target: wwn.2101001B323FE743
    Operational Status: Online
    Provider Name     : qlt
    Alias             : qlt2,0
    Protocol          : Fibre Channel
    Sessions          : 0
Target: wwn.2100001B321FE743
    Operational Status: Online
    Provider Name     : qlt
    Alias             : qlt1,0
    Protocol          : Fibre Channel
    Sessions          : 1
        Initiator: wwn.210000E08B9BF0E1
            Alias: -
            Logged in since: Sat Sep  3 02:15:56 2011
Target: wwn.50060B000065566E
    Operational Status: Online
    Provider Name     : qlt
    Alias             : qlt0,0
    Protocol          : Fibre Channel
    Sessions          : 1
        Initiator: wwn.210000E08B9BF0E1
            Alias: -
            Logged in since: Sat Sep  3 02:15:56 2011

The Nexenta box has 4 HBAs (3 Host Bus Adapters are connected, 2 to the same switch), so what we can now see are the WWNs of the targets and those of the (yet) single initiating node. Now we can add them to the host group¬† – don’t forget to have ww.<yourWWPN> because that’s how¬† STMF distinguishes between iSCSI IQNs, FC & SAS WWNs and EUI’s (Extended Unique Identifier):

# stmfadm add-hg-member -g akebono wwn.210000E08B9BF0E1
# stmfadm add-hg-member -g akebono wwn.210000E08B9BE2DF
# stmfadm list-hg -v
Host Group: akebono
        Member: wwn.210000E08B9BF0E1
        Member: wwn.210000E08B9BE2DF

Now we can delete our mapped LUN and re-map it properly so only HBAs in the host group akebono will see this LUN and be able to mount it:

# stmfadm list-view -l 600144f098680b0000004e632dc60004
View Entry: 0
    Host group   : All
    Target group : All
    LUN          : 2
# stmfadm remove-view -a -l 600144f098680b0000004e632dc60004
# stmfadm list-view -l 600144f098680b0000004e632dc60004
stmfadm: 600144f098680b0000004e632dc60004: no views found

# stmfadm add-view -h akebono 600144f098680b0000004e632dc60004
# stmfadm list-view -l 600144f098680b0000004e632dc60004
View Entry: 0
    Host group   : akebono
    Target group : All
    LUN          : 2

Voil√†¬† – well that’s it: If you want to further restrict a node to utilize only let’s say 2 out of 4 HBAs you can also create target groups too – currently akebono will be able to connect to this LUN over every reachable target path (be it FC or every other target, i.e. iSCSI). There is also a possibility to group all FC ports together, but be aware that in order to add a any target¬† to a target group, you will have to offline it for a short period (this is now problem if you have fully working multipathing):

# stmfadm create-tg fc-ports
# stmfadm add-tg-member -g fc-ports wwn.50060B000065566E
stmfadm: STMF target must be offline

# stmfadm offline-target  wwn.50060B000065566E
# stmfadm add-tg-member -g fc-ports wwn.50060B000065566E

# stmfadm online-target wwn.50060B000065566E

This offline-online procedure seems to be mandatory for every target added to a target group. Later on you can¬† (if you want) add a view to a LUN by also adding ‘-t <targetGroupName>’ including the host group like before. – It might also be¬† good thing if you want to manually balance the load on all of your target-mode HBAs.

Next up: Setting up multipathing on Linux (Debian and Scientific Linux) and Windows (2008 R2).


September 5, 2011

Posted In: Uncategorized

Tags: , ,

Leave a Comment

Nexenta / illumos as FC target (2)

The first article was a bit specific on NexentaOS 3.x bust most of the things I’ll write here can be used (currently) 1:1 on OpenIndiana and SmartOS¬† Live (once you get it installed on Disk). I’d not recommend using the UNIX shell within NexentaStor, the Appliance OS that comes with a Storage Management System (NMS) and its respective Shell tools (NMC) and Web-UI (NMV), you risk messing up to management interface temporarily.

At this point I expect you that have configured basic zoning on your FC switches. Actually I simply created a zone for each virtualization node and its targets node containing respective WWN of these Ports¬† (I did not make Zones on port basis). – Maybe I’ll wrap up the story of updating my Silkworms 200E’s from FabOS 5.1 to 6.2 ūüėČ

OK, let’s enumerate the disks and create a pool:

root@kodama:~# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
 0. c0t10d0 <DEFAULT cyl 4497 alt 2 hd 255 sec 63>
 /pci@0,0/pci8086,25f7@2/pci8086,3500@0/pci8086,3514@1/pci8086,3478@0/sd@a,0
 1. c0t12d0 <SEAGATE-ST3300656SS-0005 cyl 22465 alt 2 hd 6 sec 636>
 /pci@0,0/pci8086,25f7@2/pci8086,3500@0/pci8086,3514@1/pci8086,3478@0/sd@c,0
 2. c0t13d0 <SEAGATE-ST3146855SS-0001-136.73GB>
 /pci@0,0/pci8086,25f7@2/pci8086,3500@0/pci8086,3514@1/pci8086,3478@0/sd@d,0
 3. c0t14d0 <SEAGATE-ST3146356SS-0006-136.73GB>
 /pci@0,0/pci8086,25f7@2/pci8086,3500@0/pci8086,3514@1/pci8086,3478@0/sd@e,0
 4. c0t15d0 <SEAGATE-ST3146855SS-0001-136.73GB>
 /pci@0,0/pci8086,25f7@2/pci8086,3500@0/pci8086,3514@1/pci8086,3478@0/sd@f,0
 5. c0t19d0 <SEAGATE-ST3300655SS-0001-279.40GB>
 /pci@0,0/pci8086,25f7@2/pci8086,3500@0/pci8086,3514@1/pci8086,3478@0/sd@13,0
 6. c0t20d0 <SEAGATE-ST3300655SS-0001-279.40GB>
 /pci@0,0/pci8086,25f7@2/pci8086,3500@0/pci8086,3514@1/pci8086,3478@0/sd@14,0
 7. c0t21d0 <SEAGATE-ST3300655SS-0001-279.40GB>
 /pci@0,0/pci8086,25f7@2/pci8086,3500@0/pci8086,3514@1/pci8086,3478@0/sd@15,0
 8. c0t22d0 <SEAGATE-ST3300655SS-0001-279.40GB>
 /pci@0,0/pci8086,25f7@2/pci8086,3500@0/pci8086,3514@1/pci8086,3478@0/sd@16,0
Specify disk (enter its number): ^C
# zpool create sasmirror1 mirror c0t19d0 c0t20d0 mirror c0t21d0 c0t22d0 spare c0t12d0

Now we can see our newly created pool and start carving out Volumes (“zVols”)

# zpool list
NAME           SIZE  ALLOC   FREE    CAP  DEDUP  HEALTH  ALTROOT
sasmirror1     556G   249K   556G     0%  1.00x  ONLINE  -
syspool       34.2G  9.17G  25.1G    26%  1.00x  ONLINE  -

# zpool status sasmirror1
  pool: sasmirror1
 state: ONLINE
 scan: scrub repaired 0 in 0h0m with 0 errors on Wed Aug 31 13:06:45 2011
config:

        NAME         STATE     READ WRITE CKSUM
        sasmirror1   ONLINE       0     0     0
          mirror-0   ONLINE       0     0     0
            c0t19d0  ONLINE       0     0     0
            c0t20d0  ONLINE       0     0     0
          mirror-1   ONLINE       0     0     0
            c0t21d0  ONLINE       0     0     0
            c0t22d0  ONLINE       0     0     0
        spares
          c0t12d0    AVAIL   

errors: No known data errors

After creating a data pool we can start carving out ZFS volumes (zVol):

# zfs create -V 20G sasmirror1/akebono-scratchvol
# zfs list
NAME                            USED  AVAIL  REFER  MOUNTPOINT
sasmirror1                      433G   114G    31K  /sasmirror1
sasmirror1/akebono-scratchvol  20.6G   135G    16K  -
syspool                        10.2G  23.5G  36.5K  legacy
syspool/dump                   7.00G  23.5G  7.00G  -
syspool/rootfs-nmu-000         1.46G  23.5G  1.03G  legacy
[...]

sbdadm – SCSI block device administration CLI

Albeit we have our zVol available we have to tell STMF that this is a volume that can be mapped as a LUN, this is why sbdadm is here:

# sbdadm create-lu /dev/zvol/rdsk/sasmirror1/akebono-scratchvol
Created the following LU:

              GUID                    DATA SIZE           SOURCE
--------------------------------  -------------------  ----------------
600144f098680b0000004e632dc60004  21474836480          /dev/zvol/rdsk/sasmirror1/akebono-scratchvol

(Update 2012: You can also use stmfadm to create a LUN, it’s up to you what you want to use, I think the output of sbdadm is still better)

At this point I’ld like to remember the flexibility the original engineers at Sun added into into¬† the ‘SCSI target mode framework’ (STMF):

You can not only map zVols but single (image) files on a filesystem or even single disks. The later one might make sense when you  might have a hardware RAID controller where the OS only sees 1 virtual disk instead. РBut often zVols tend to be the most integrated way (also in terms of performance) with STMF.  In fact, the appliance OS from Nexenta only allows mapping zVols as SCSI LUNs.

stmfadm – SCSI target mode framework CLI

The GUID you saw previously is what we will finally map in STMF – this time I will just map the LUN to every initiator and every target we have:

# stmfadm add-view 600144f098680b0000004e632dc60004
# stmfadm list-view -l 600144f098680b0000004e632dc60004
View Entry: 0
    Host group   : All
    Target group : All
    LUN          : 2

In the next post I will write about how creating target and host groups allows to precisely map LUNs to a node with their HBAs. You should now see the newly mapped LUN from any FC-connected Host. РYou might need to rescan the bus (i.e. use vendor-specifc script in Linux or refresh the  Disk Management on Windows).

2012.02: Updated a comment and fixed some errors.


September 4, 2011

Posted In: Uncategorized

Tags: , ,

Leave a Comment

Nexenta / illumos as FC target (1)

I have been lucky to get some elderly 4 GBit FC¬† hardware step by step from used-hardware resellers for a reasonably low¬† price – this originated in the interest after deploying a iSCSI SAN at work using the OpenSolaris (soon to be illumos) based NexentaStor appliance OS. Thankfully my employer allowed my to install and test the gear at work – because I don’t have home similar to¬† @tschokko.

Since the free-as-in-beer Community Edition doesn’t allow managing FC target other than via native OS shell, I went with Nexenta Core 3, which is their free-as-in-speech distribution of OpenSolaris snv134 + a ton of patches from later ONNV and illumos.- I chose it because I wanted to have the most similar kernel to what is used in the commercial edition. In this series I’d like wrap up how I how I went the ride (i.e. self-documentation…).

Preparing the OS:

Installing the Nexenta Core OS¬† is pretty much straightforward if your hardware is supported I won’t comment on that besides that you should give a static IP since if you are not a daily Solaris admin, you will have to do more googling how you have to disable NWAM and do manual config. ūüėČ

After installation of NexentaOS 3.0.1 I’d recommend you to upgrade to the latest bits, but before that you should install sunwlibc because without this, the STMF won’t run (mostly equivalent what went into NexentaStor 3.1.1 curently):

apt-get update
apt-get install sunwlibc
apt-clone upgrade

You can then reboot into the clone of the updated OS, the original Kernel in 3.0.1 has a couple bugs that were squashed later on – but most importantly you will get the latest open source ZFS file system (v5) and pool version (v28).

Enabling STMF and switching HBAs to target mode:

root@kodama:~# svcadm enable stmf
root@kodama:~# svcs -v stmf
STATE          NSTATE        STIME    CTID   FMRI
online         -             Aug_31        - svc:/system/stmf:default

Afterwards we have to switch HBAs into target mode – assuming you have 4G or 8G FC HBA this the driver we need is called ‘qlt’. – There is also a driver for Emulex HBAs where things are a bit different. Important side

root@kodama:~# mdb -k
> ::devbindings -q qlc
ffffff03597fe030 pciex1077,2432, instance #0 (driver name: qlc)
ffffff03597fb2c0 pciex1077,2432, instance #1 (driver name: qlc)
ffffff03597fb038 pciex1077,2432, instance #2 (driver name: qlc)
ffffff03597f6ce8 pciex1077,2432, instance #3 (driver name: qlc)
> $q
More

You can use a command to tell the OS not using qlc but qlt – but you can also edit the /etc/driver_aliases and replace the occurence of qlc where pxiex1077 appears:

/etc/driver_aliases:

[...]
qlt "pciex1077,2432"
qlc "pciex1077,2532"

After you have done this you will have to reboot the system for a last time. Enabling STMF (SCSI Target Mode Framework) is important since it will handle the upload of a Qlogic target mode firmware on all your HBAs. Without this firmware your HBAs will continue blinking (~ no link) and stay unoperational. If you made things right, you should see something like this:

root@kodama:~# fcinfo hba-port
HBA Port WWN: 50060b000065566e
        Port Mode: Target
        Port ID: 10000
        OS Device Name: Not Applicable
        Manufacturer: QLogic Corp.
        Model: HPAE311 (-> This is a HP-branded QLE2460)
        Firmware Version: 5.2.1
        FCode/BIOS Version: N/A
        Serial Number: not available
        Driver Name: COMSTAR QLT
        Driver Version: 20100505-1.05
        Type: F-port
        State: online
        Supported Speeds: 1Gb 2Gb 4Gb
        Current Speed: 4Gb
        Node WWN: 50060b000065566f
HBA Port WWN: 2100001b321fe743
        Port Mode: Target
        Port ID: 10400
[...]
root@kodama:~# stmfadm list-target
Target: wwn.50060B0000655664
Target: wwn.2101001B323FE743
Target: wwn.2100001B321FE743
Target: wwn.50060B000065566E

Congratulations you have a working fibre channel target box – You might also re-do the same mdb -k command and search for devbindings of qlt and qlc.

Next up: Carvin out volumes and do LUN mapping

September 2, 2011

Posted In: Uncategorized

Tags: , ,

3 Comments