Body
Overview
The Problem Management process for Technology Services is designed to achieve standardized and repeatable handling and documentation of Problems experienced by Technology Services services. This process includes gathering Problem data, analyzing that data, and then implementing a solution that will prevent further recurrences. Problem Management assists Technology Services in making informed business decisions regarding upgrading or retiring services, training or increasing personnel, or adjusting processes and procedures to become more efficient and error-resistant.
When to Use Problem Management
Technology Services Problem Management Process should only be applied to services designated as unit "TAMU" or "TAMU-IT" within ServiceNow. Services within other units will fall under the Problem Management processes for those units. Additionally, Problems should only be recorded if the documentation of that Problem is likely to have future value. This excludes documentation of issues stemming from such things as data entry errors, simple one-time misconfigurations, or incidents that are "help requests".
There are some Incidents which should always be recorded as Problems. Issues which become Outages or Degradations and are published to status.it.tamu.edu should always be recorded as Problems. Incidents which have a High or Critical priority or affect three or more campus community members should also be evaluated for the potential need of a corresponding Problem Record. Service Providers and Fulfillers should also open Problem records for any issue which requires long-term Root Cause Analysis (RCA), any issue which requires documentation of a long-term or complex Workaround, and any Known Error.
Problem Management Terms
- Problem - The cause of one or more incidents or potential incidents.
- Root Cause - The Root Cause of a Problem is the underlying reason/cause of the disruption to service. Searching for, analyzing, rectifying and documenting root causes are major activities within Problem Management.
- Known Error - A Known Error is a Problem which has been diagnosed, but for which no current resolution exists.
- Workaround - A Workaround is a stated procedure to circumvent the impact of a service disruption in order to restore service to users.
Process Overview
- Problems will be opened by Service Providers when they identify a Problem. All Outages and Degradations will have Problems opened for them by the Operations Center if they do not already have a Problem assigned. If the Service Desk and Operations Center identify a Problem before the Service Provider, and are in consensus about the issue being a Problem, they will open a Problem Record.
- The Problem Record will be assigned to either the Service Provider who discovered it, the technician contact for the Outage or Degradation, or the Assignment Group Manager for the assignment group believed to be best able to analyze and remediate the Problem. This Assignment Group Manager will then assign the Problem Record to a group member for analysis.
- The Problem will undergo Root Cause Analysis to determine the underlying cause. Information on the analysis should be recorded within the Problem Record.
- Remediation should be implemented when the Root Cause is known. If necessary, a Change Request should be drafted. Information about the remediation should be recorded within the Problem Record.
- If the Issue is a Known Error, the Problem should be set to the "Known Error" state. Workarounds for the Problem should be documented as Knowledge Articles.
- Once resolved, the Problem should be set to "Closed/Resolved".