NetApp Premium AutoSupport Detailed Health Check
Hooray! It is now time to drill into the Health Summary and Detailed Health check!
For those who remember the last Dashboard snapshot, we had a number of warnings and notices on this filer!
So, we’re going to drill down into the 12 warnings and 6 notices!
The Health Check Details will provide you with a number of “sections” which will provide a number of bits of information (You may have one or more of the areas I’ll be quoting out)
It all starts with the Health Check Analysis:
Very basic, straight forward and provides you with a timeline of what the recommendations are based upon. This can be especially useful if you say “I fixed that!” you know when its referencing so as to not freak out!
Next comes System Level Warnings:
You’ll need to zoom in to see what kind of cool stuff there is, but I’ll zoom in for you in some areas I want to make sure you recognize.
That little bad boy there, oh my god, so cool if you happen to want to actually find out Why there is an error instead of knowing there is one.
So, if we click on that lun.offline message to figure out “WTF” is going on, we get this!
For simplicity sake, it showed us the exact error in the logs, showed us an indication as to why this happens and provides corrective actions on what to do!
There are other better examples of corrective actions (such as replace disk, unmap/remap luns, collect a trace) and so on and so forth to help you not only manage your system better, but come to better terms with the ‘warnings’ of your systems so you can be in better command and control of your own operations.
Next up is System Level Notices:
This is cool because it tells you something outright, and then provides you a link to a Bug Report on it, so you can follow the status especially if it applies to you!
Now this is cool as well, Volume Related Notices:
Note the disclaimer: These are based on conservative guidelines and may or may not be applicable to this system, but you should definitely know about it if you weren’t aware!
What I like about this, is it calls out specific volumes and discusses their snapshot usage, snapshot reserve and snapshot schedules (3 areas I find often accidently configured wrong and a hotbed of areas to clean up!)
And last but not least, another favorite area, Summary of Disks requiring Firmware Upgrade:
I personally hate playing guessing games of “Hmm, are my disks on the right rev? Do I need to upgrade?” etc, etc, etc… that same old story. No matter how many types or different disks you have in the system, this Health Check will tell you the skinny.
The only thing which would make this cooler, is Shelf Firmware Upgrade:
(I had to go to another filer to get this screen capture, because all these filers are current ;))
One of my favorite parts of this Health Check tool is that after I’ve upgraded and updated a system and I want to feel all warm and fuzzy about the work, I’ll go look at these details and make sure I’m not seeing warnings, notices, backrevved disks or modules, etc.
This not only saves me and my customers time and money at that moment, it also pays off dividends in the long-term. Less work to be done to manage and maintain, less chance for unknown downtime because you know the EXACT state of your system at any given point.
The Premium AutoSupport toolset (This being just one part of it) opens the door to allowing you to not only self manage, but self-control and I’ll tell you – any system I’ve felt comfortable and confident in after building it, has never gone down.
It’s not magic, a special arcane craft or art form. It’s clear conscious best practices and using the tools available to you, an ounce of prevention is worth an hour of downtime! :)
Hopefully you’ve enjoyed this segment, I’ll be hitting up Visualizations next so look forward to my delivery of that!
Disclaimer: The information above reflects only some of the type of notices you can receive. If you do not see any of these notices on your system that is great! If you encounter additional ones (Aggregate Level Notices, Volume Related Warnings, etc) it is not a problem just an education of the current health of that particular system and should be reviewed.