Network and application troubleshooting can be one of the highest profile and aggravating activities in which IT engages. Pressure can increase exponentially on IT personnel as problem resolution time increases, since it directly correlates to network and application slowness and downtime.
According to the Enterprise Management Associates report, Network Management Megatrends 2016, IT teams already spend around 36 percent of their daily efforts on reactive troubleshooting efforts. Therefore, there has been effort directed over the years to reducing mean time to repair (MTTR) to free up this precious IT admin and engineering time. Unfortunately, according to ZK Research, 85 percent of MTTR is spent just trying to figure out that there is indeed a problem.
What if there were a better way? A way that would allow you to:
- Potentially reduce MTTR by up to 80 percent
- Expose hidden network blind spots
- Eliminate unnecessary technical road blocks for troubleshooting
- Eliminate unnecessary process road blocks for troubleshooting
Here are five of the top activities IT professionals can implement to improve their company’s troubleshooting efforts:
- Insert taps between the network and monitoring tools (or network packet broker) to improve the quality of monitoring data and time to data acquisition
- Deploy network packet brokers (NPBs) between those taps and the security and monitoring tools to optimize the data sent to the tools
- Deploy NPBs that support floating filters to further decrease the time to data acquisition
- Use NPBs that support adaptive monitoring, which speeds up the data filter deployment process by using automation to replace manual intervention
- Implement proactive troubleshooting with application intelligence to create a macroscopic troubleshooting approach that reduces fault localization time
Once taps and NPBs are inserted into the network, network-affecting changes (to collect troubleshooting data) are all but eliminated, assuming the deployment was done correctly. Taps are passive devices and will not materially affect network traffic after they are inserted into the network. Unlike SPAN ports, they provide a complete copy of all network data, including corrupted and malformed packets (which can aid in troubleshooting activities). Security and monitoring tools can then be connected to the NPB at will. This can dramatically speed up troubleshooting diagnostic time as many Change Board approvals can be eliminated. Change Boards typically govern the production network and oversee what activities can and cannot be implemented to the network. This is for good reason because these changes often cause network disruptions and outages. With the new tap and NPB configuration, the IT department can often start troubleshooting activities immediately without affecting the network. Now there is no need to wait minutes, hours, or days for approval to connect diagnostic equipment to the network, because it is already connected and ready to go.
A second consequence of adding a packet broker is the ability to optimize data filtering and reduce (or eliminate) the need for crash carts. Data filtering gets rid of the “junk data” which speeds up time to resolution by the monitoring tool(s) by removing. In addition, the NPB gives you instant access to the data and tools that you do need. This can dramatically speed up troubleshooting diagnostic times as crash carts (special purpose carts with a collection of triage and troubleshooting tools) are no longer required. The tools are now pre-connected to the NPB. This eliminates time spent locating the cart, moving the cart to the correct place, and inserting it into the network, reducing configuration time for the network tools. This can be especially pertinent if troubleshooting needs to be conducted on links and equipment in remote locations. MTTR reductions of up to 80% are possible simply due to the elimination of Change Board approvals and crash carts.
A third activity you can implement, depending upon your choice of NPB, is to make use of floating filters. These are specific NPB filters for troubleshooting that can be pre-staged and connected to standby troubleshooting tools (e.g., analyzers, Wireshark, Snort). Once implemented, these pre-staged filters can dramatically cut data collection times, as the troubleshooting filter simply needs to be connected to an incoming network port to the NPB. This can be done remotely using a drag-and-drop interface on the NPB. Once the connection is made, the tool can start capturing critical data in less than 1 minute, dramatically reducing troubleshooting time and costs.
A fourth activity to consider is the deployment of NPBs that support adaptive monitoring. Adaptive monitoring is the ability of the NPB to respond to network commands and make configuration changes. This automation capability improves monitoring response times by being able to respond to network incidents with actions in near real-time. Commands can be received using a representational state transfer (REST) interface from network management systems (NMS), orchestration systems, security information and event managements (SIEMs), etc. Faster responses to problems result in a shorter mean time to diagnosis and corresponding faster MTTR.
A fifth way to improve MTTR is to implement proactive troubleshooting using application intelligence within an NPB. Application intelligence uses application-related data to look at additional network data information. User geolocation, device type, browser type, border gateway protocol assignment (BGP AS) information, and application traffic change information can be used to help pinpoint problems. If this information is looked at in conjunction with trouble incident reports, then this can often shorten troubleshooting time. For instance, is the problem affecting all devices or operating systems or just specific ones? Are the incidents being reported from geographic area? Is the incident related to a specific carrier or Internet service provider? These data points can be very useful in diagnosing problems
Utilizing these five use cases can significantly improve troubleshooting efforts by reducing the amount of time spent on problem resolution. A lower problem resolution time reduces network downtime and helps IT personnel meet or exceed their problem resolution time KPIs.