MICROSOFT WINDOWS KERNEL DEBUGGING FOR DRIVERS AND APPLICATIONS
(5 days)

Who Should Attend
Driver developers, system engineers, software developers, support engineers, and systems administrators who need to know how to set-up for, use and analyse OS crash dumps, how to debug Windows crashes, identify the reasons for crashes they are seeing.

Prerequisites

Participants are expected to understand operating system concepts in general, and the basic concepts of the Windows operating system including some (moderate) applications or driver level development experience.

Overview

When Windows crashes it displays the “Blue Screen of Death” and writes the contents of memory at the moment of the crash including all specific information about drivers, programs and system utilities into a crash dump file.

Participants will learn how to find, read and understand crash dumps in order to achieve better system stability when programming for systems and applications in Windows 2000/XP systems.


Contents

Windows Operating System Overview

Windows internals overview
OS Subsystems and the decomposition of the kernel into separate Executive Subsystems
Introduction to WDM architecture
Differences between kernel mode and application mode


Windows 2000/XP Kernel Internals

Win32 API
Services, Functions, and Routines
Processes and Threads
Virtual Memory
Kernel Mode vs. User Mode
Unicode
Registry

Tools, SDK and DDK

Perfmon
Windows 2000/XP Support Kit and tools
Windows 2000/XP Resource Kits
Kernel Debugging Tools
Platform SDK
Device Driver Kit (DDK)
3rd party Tools


Introduction to WinDBG

What is WinDBG
How WinDBG works
Symbol file trees
Setup and use of a symbol store
How to set up an environment for kernel debugging
How to set up WinDBG for “after-crash” debug sessions
Introduction to WinDBG Commands.

Laboratory Session

Setting up WinDBG
Examining the state of a running system: participants will open each of a series of sample crash dump files with the debugger and learn how to recognise and solve any debugger setup problems (symbols and executable paths, etc.). For each file, participants will then examine the resulting debugger output for clues as to the cause of each crash and the software component that was executing at the time.

Basic System Management

Trap Dispatching
Interrupt Dispatching
Exception Dispatching
System Service Dispatching
Object Manager
Executive Objects
Object Structure
Synchronization
Kernel Synchronization
Executive Synchronization
System Worker Threads
Windows 2000 Global Flags
Local Procedure Calls (LPCs)


Error Causes and Data Structures

How to do crashed system post mortem
Reading crash dumps, examining thread stacks, traps
How to read and interpret BSOD
How internal mechanisms work
Processes, threads, system calls, interrupts, IRQLs, memory management, and various types of I/O operations
What are the debugger commands to display each of these components
How to use support tools and resource kits

Laboratory Session

Using the tools and debugger commands presented to collect information from the running systems and from the sample dump files
Initial fault isolation
Looking at the state of all system threads
Finding I/O operations
Examining common crash scenarios ("bad pointer" being the most common, as it generates PAGE_FAULT_IN _NONPAGED_AREA, MODE_EXCEPTION_NOT_HANDLED
IRQL_NOT_LESS_OR_EQUAL)


Introduction to Kernel Debugging

How the Bugcheck works and to obtain information from it
Debugger commands for detailed crash analysis
How to setup for minidumps and how to use them
How to do live debugging
How to break into kernel
How to troubleshoot system hangs by using live debugging

Laboratory Session

Opening a crash dump, searching in it and finding the exact cause of the crash and the software component which crashed the system
System hangs by faulty applications and device drivers

System and Driver Deadlocks

Defining deadlock and livelock
Examining their causes
Common deadlock causes, such as file system reentrancy, worker thread exhaustion
Techniques for determining causes of deadlock

Laboratory Session

Deadlock analysis using live systems and post-mortem dumps
Using WinDbg to examine deadlock to identify them


Debugging Under Pressure

How to handle debugging "field problems" where the system is not available
Including remote debugging using WinDBG, OSR's DBGMon utility or other third party utilities
How to collect crash dumps along with relevant files
Enabling internal Windows checks in free build
Techniques for building new debugging tools


Laboratory Session

Handling WinDBG problems
Alternatives to WinDBG (i386kd, SoftICE)
Ensuring crash dumps match symbols
Debugging across the network


Disassembly

Understanding assembly language
Common x86 operation codes and addressing modes
How each is displayed by the debugger
Using the debugger's disassembly and other display utilities to dissemble the programs causing system crash


Laboratory Session

Examining several provided samples of assembly code
Associating each sample with corresponding C code fragments
Interpreting a small assembly language program in order to understand what it is doing

Call Stack

Understanding the procedure calling conventions used by the Windows operating systems
Details of CALL and RET instructions
Argument passing mechanisms (stdcall vs. cdecl, etc.)
Standard call frames
Frame pointer optimisation (FPO) and other optimisations
Fastcall mechanism
Details of debugger commands used to display stack information, including commands and techniques that allow users to display what might otherwise be "lost stacks"

Laboratory Session

Identifying calling conventions used in several provided samples of disassembled code
Examining areas of memory that include call frames
Identifying return addresses, arguments passed to procedures

Advanced Crash Dump Analysis

Using information on disassembly and call stacks to analyse crashes much more accurately and in much greater detail
Additional debugger mechanisms and "convenience" facilities
Source mode debugging.

Laboratory Session

Examining each of the sample crash dumps seen previously
For cases previously identified as "needing further analysis," participants will perform the additional levels of examination now possible, based on disassembly and stack trace interpretation
Verifying that each of the solutions found previously were in fact correct, and if not, determine the true cause of each failure.

Advanced Kernel Debugging

Using the additional information presented in the disassembly and call stack modules to expand on the previously introduced principles of live kernel debugging. Remote debugging and the use of Driver Verifier are included.

Laboratory Session

Using a kernel mode live debugging environment to identify and, in some cases, correct various sample problems in running systems
Using a verifier to confirm a previously-identified problem in a sample dump
Setting up and demonstrating a remote debugging environment using three machines (equipment permitting)

User Mode Debugging

How the principles of debugger and crash dump analysis are applied to troubleshoot user mode problems, i.e. problems in applications and services
Default application failure reporting mechanism (DrWtsn32.exe) and its relationship to debugging
Generation and analysis of user mode dump files
Using character-mode, user mode debugger (NTSD) to debug CSRSS and WinLogon system processes.

Laboratory Session

Attaching the debugger to a running process
Starting a buggy application under the debugger
Identifying the cause of the problem
Creating and analysing a user mode dump file from a "hung" application.

Driver Related Problems

Finding and isolating driver related problems, including memory corruption, buffer overrun, DPC latency issues (where a driver misbehaves and causes other drivers to malfunction)
Discussing methods for analysing and determining system configuration
Finding problems related specifically to interrupt and DPC handling
Identifying problems associated with HAL, Kernel, or the driver


Laboratory Session

Debugging drivers - examining crash scenarios where a driver was at fault
Analysing ISR/DPC behavior on a crashed system