The Storage System Recovery and Troubleshooting class is focused on advanced and in-depth topics regarding offline storage systems. Where previous classes in the new hire OnRamp program covered ONTAP diagnostic commands; this class covers what the commands are doing in the background. This deeper understanding will enable TSEs to form action plans based on feedback from the offline system, rather than following a pre-written template. The class is intended to provide to students the knowledge to resolve the majority of unavailable systems without escalation.
Training Units: 48
This course focuses on enabling you to do the following:
• Explain the ONTAP boot process and common boot issues
• Describe the features of the special boot menu
• Demonstrate the steps required to perform an ONTAP upgrade and reversion
• Explain how to diagnose storage controller problems
• Describe how to upload core files
• Explain the volume recovery process
• Discuss disk and shelf errors
Role
• Administrator, operator
• Support engineer, implementation engineer, professional services
Prerequisites
• ONTAP Cluster Administration
• ONTAP Hardware Component Troubleshooting and Analysis
• ONTAP Software and Configuration Troubleshooting and Analysis
Module 1: The boot process
• Explain the boot processs
• Get started with troubleshooting
• Troubleshoot boot issues
Exercises
• Netboot a system at the loader prompt
Module 2: Troubleshooting multiple disk failures
• Troubleshoot systems with multiple failed disks
Exercises
• Troubleshoot a disk issue
• Troubleshoot a storage issue
Module 3: ONTAP upgrades and reverts
• Review ONTAP upgrade types
• Examine ONTAP reversion rules
• Discuss firmware updates
Exercises
Perform an ONTAP upgrade
Module 4: Storage controller problems
• Define basic terms in hardware faults
• Explain error and recovery flow types
• Identify logs that are used for troubleshooting
• Describe methods of analyzing data
Exercises
• Planning replacements
• Panic strings
Module 5: Performing triage for a failure
• Investigate problem determination
• Examine AutoSupport files
• Manage core files
• Discuss log files
• Explore aggregate and volume management
• Discuss takeover and giveback
Exercises
• Access a core file
• Relocate an aggregate
• Recover a deleted volume
Module 6: Troubleshooting storage issues
• Identify disk and media errors
• Describe how RAID manages disk errors
• Locate and use SCSI codes
Exercises
• Power-cycle a shelf
• Troubleshoot a storage issue