Graph Traversal (BFS, DFS, Topological Sort)
Fundamental techniques for systematically exploring graph structures, forming the foundation for virtually all graph algorithms. Breadth-First Search (BFS) explores level-by-level, Depth-First Search (DFS) explores as deep as possible before backtracking, and Topological Sort orders directed acyclic graphs respecting dependencies. These aren’t just algorithms—they’re ways of thinking about structured exploration.
Introductory to Intermediate
- Basic BFS and DFS: Introductory
- Applications (connectivity, cycle detection): Introductory to Intermediate
- Topological sort and dependency resolution: Intermediate
- Advanced DFS applications (strongly connected components): Intermediate to Advanced
The algorithms that map the internet: Graph traversal isn’t academic—it’s how we understand networks, dependencies, and relationships. These techniques power:
- Web crawling (how Google indexes the internet)
- Social network analysis (friend recommendations, influence detection)
- Circuit design (timing analysis, verification)
- Task scheduling (build systems, project management)
War Story 1 - Web Crawling at Scale: Google’s original web crawler used BFS to discover and index web pages. Starting from seed URLs, BFS naturally discovers:
- Pages by “distance” from seed (number of clicks)
- High-quality pages first (connected to many known pages)
- The structure of the web itself
This simple algorithm processes billions of pages and forms the foundation of the most valuable internet company. The insight: BFS on the web graph naturally prioritizes important pages.
War Story 2 - The Build System Crisis: A major software company’s build system became unmaintainably slow. Builds took 8+ hours. Problem: circular dependencies and poor ordering meant rebuilding components unnecessarily.
Solution: Topological sort of the dependency graph:
- Detected all circular dependencies (revealed design flaws)
- Computed optimal build order
- Enabled parallel builds of independent components
- Reduced build time to under 1 hour
A graph algorithm saved hundreds of developer-hours daily. The lesson: dependency structures are everywhere in software engineering.
- Breadth-First Search (BFS) - Level-by-level exploration, shortest paths in unweighted graphs
- Depth-First Search (DFS) - Recursive deep exploration, discover/finish times
- Connected Components - Finding connected pieces of graphs
- Cycle Detection - Determining if graph contains cycles
- Topological Sort - Linear ordering of DAG vertices respecting dependencies
- Bipartite Testing - Determining if graph is 2-colorable (using BFS/DFS)
- Strongly Connected Components (SCC) - Kosaraju’s and Tarjan’s algorithms
- Articulation Points and Bridges - Finding critical vertices/edges (DFS-based)
- Directed Cycle Detection - For dependency analysis
- Tarjan’s SCC Algorithm - One-pass DFS with lowlink values
- DFS Tree Classification - Tree edges, back edges, forward edges, cross edges
- 2-SAT Solution - Using SCC to solve satisfiability
- Planarity Testing - DFS-based algorithms for graph embedding
Required:
- Graph terminology (vertices, edges, directed/undirected, paths, cycles)
- Basic data structures (queues, stacks, adjacency lists)
- Recursion (for DFS)
- Big-O notation
Helpful:
- Trees and tree traversals (natural introduction to DFS)
- Some proof techniques (for correctness arguments)
- Understanding of stacks and recursion relationship
Setup: You’re trapped in a maze. At each intersection you can go left, right, or straight. There’s an exit somewhere. How do you explore systematically to guarantee finding the exit?
Naive approaches:
- Random walking — might never find exit, might revisit same places forever
- “Always turn right” — works for some mazes, fails for others (gets stuck in loops)
Guided discovery:
“What if you mark where you’ve been?” “When you reach a dead end, how do you backtrack?” “If there are multiple paths, which do you explore first?”
Two strategies emerge:
Strategy 1: “Explore as far as possible before backtracking”
- Go deep down one path until dead end
- Backtrack to last unexplored turn
- Recurse
- This is DFS!
Strategy 2: “Explore all neighbors before going deeper”
- Mark current position
- Explore all adjacent intersections
- Then explore their neighbors
- Level-by-level like ripples in water
- This is BFS!
The insight: Different exploration strategies have different properties:
- BFS finds shortest path to exit (minimum turns)
- DFS uses less memory (only stack depth, not full level)
- Both guarantee finding exit if one exists (completeness)
Real-world connection: Roomba vacuum cleaners use variants of these algorithms to ensure complete floor coverage.
Setup: You’re registering for courses. The course catalog shows:
- Algorithms requires Data Structures
- Data Structures requires Programming
- Machine Learning requires Algorithms and Linear Algebra
- Linear Algebra requires Calculus
- …and 50 more courses
Question: What order should you take courses to satisfy all prerequisites?
The challenge: Can’t take a course until all prerequisites are complete. Need systematic way to order courses.
Discovery path:
- “Start with courses that have no prerequisites”
- “Once you’ve taken a course, new courses become available”
- “What if there’s a cycle?” (e.g., A requires B, B requires A) — impossible!
The algorithm emerges:
- Find all courses with no prerequisites (in-degree 0)
- “Take” one such course (add to order)
- Remove that course and its outgoing edges
- Repeat until all courses scheduled
This is topological sort!
Correctness insights:
- If no course has in-degree 0, there’s a cycle (can’t complete degree!)
- Every DAG has at least one vertex with in-degree 0
- The order guarantees prerequisites are met
Real applications:
- Build systems (Make, Bazel, CMake)
- Package managers (npm, pip, apt)
- Task scheduling in operating systems
- Spreadsheet formula evaluation
Setup: You have a social network with millions of users. Given two users, find:
- Are they connected (even through mutual friends)?
- What’s the shortest “friendship path” between them?
- Can we split the network into independent communities?
The tools:
BFS for shortest paths:
BFS from person A:
Level 0: A (distance 0)
Level 1: A's friends (distance 1)
Level 2: Friends-of-friends (distance 2)
...
First time we see B, that's the shortest path!
Why BFS, not DFS? BFS explores by distance—first path found is shortest. DFS might find a long winding path first.
DFS for community detection:
- Run DFS from arbitrary person
- All reachable people form one connected component (community)
- Pick unvisited person, repeat
- Number of DFS runs = number of disconnected communities
The revelation: Graph traversal algorithms answer fundamental questions about network structure:
- Connectivity (can we reach?)
- Distance (shortest path)
- Components (communities)
- Bridges (critical connections)
Real-world scale:
- Facebook: billions of users, hundreds of billions of friendships
- BFS run in milliseconds on optimized infrastructure
- These algorithms make social networks useful
Systematic exploration: When facing complex problems with many options (career paths, apartment hunting, research directions), BFS and DFS represent different search strategies:
- BFS approach: Explore all immediate options before committing deeply (breadth of experience)
- DFS approach: Commit deeply to one path before trying another (depth of expertise)
Neither is always better—the right choice depends on context.
Dependency management in life: Topological sort formalizes the insight that some tasks must precede others:
- Home renovation: demolition before construction
- Learning: fundamentals before advanced topics
- Project planning: dependencies constrain schedule
Finding critical points: Articulation points (vertices whose removal disconnects graph) model:
- Key employees in organizations (single points of failure)
- Critical infrastructure (bridges, power stations)
- Vulnerabilities in systems
- Skiena §7 - Graph traversal with war stories and applications
- Erickson Chapter 6 - Excellent intuition building for DFS properties
- CLRS Ch 22 - Comprehensive treatment with formal proofs
- Kleinberg & Tardos §3 - Clean presentation with BFS/DFS applications
- DPV §3-4 - Concise treatment focusing on key insights
- Sedgewick & Wayne “Algorithms” Ch 4 - Practical implementations with visualizations
- Jeff Erickson’s notes - Emphasis on DFS properties and applications
- VisuAlgo (visualgo.net) - Interactive BFS/DFS animations
- Algorithm Visualizer - Step-by-step graph traversal
- GraphOnline - Draw graphs and run algorithms
- MIT 6.006 (Demaine) - Classic graph traversal lectures
- Stanford CS161 - Problem-motivated approach
- William Fiset YouTube series - Clear animated explanations
- Divide-and-Conquer: DFS naturally divides graph into subproblems (subtrees)
- Dynamic Programming: Many graph DP algorithms use DFS/BFS as foundation
- Shortest Paths: BFS is shortest path for unweighted graphs; weighted versions (Dijkstra, Bellman-Ford) build on BFS insights
- Network Flows: Often use BFS to find augmenting paths
- NP-Complete Problems: Many reductions use graph reachability (computed via BFS/DFS)
- Randomized Algorithms: Random walks on graphs use traversal ideas
- Parallel Algorithms: Level-synchronous BFS naturally parallelizes
Web Crawler Simulator
- Implement BFS-based web crawler
- Visualize crawl frontier and discovered pages
- Add politeness policies (rate limiting, robots.txt)
- Measure discovery rate on sample web graph
Social Network Analyzer
- Load real social network data (Twitter, academic collaborations)
- Compute shortest paths between users (Six Degrees of Separation)
- Find connected components (communities)
- Identify bridges (key connectors)
- Visualize network structure
Build System Dependency Resolver
- Parse dependency files (package.json, Makefile, etc.)
- Detect circular dependencies
- Compute valid build order via topological sort
- Identify independent tasks for parallel builds
- Visualize dependency graph
Maze Generator and Solver
- Generate random mazes using DFS
- Solve mazes using BFS (shortest path) and DFS
- Visualize exploration process
- Compare path length and exploration efficiency
- Add obstacles and multiple exits
Graph Property Tester
- Implement suite of graph tests: bipartite checking, cycle detection, connectivity
- Visualize DFS tree and edge classification
- Find articulation points and bridges
- Compute strongly connected components
- Apply to real-world networks (power grid, road networks)
2-4 lectures depending on depth
- 2 lectures: BFS, DFS basics, connectivity, cycle detection, topological sort
- 3 lectures: + Bipartite testing, DFS applications, SCC (Kosaraju)
- 4 lectures: + Tarjan’s algorithm, articulation points, advanced DFS properties
“BFS always uses less time than DFS”
- False! Both are O(V + E) in worst case. BFS may use more space (full level in queue vs. stack depth).
“DFS finds shortest paths”
- False! DFS finds some path, not necessarily shortest. BFS guarantees shortest path in unweighted graphs.
“Topological sort requires DFS”
- False! Can use BFS (Kahn’s algorithm with in-degrees) or DFS (finish time ordering). Both work.
“Graphs must be connected for traversal to work”
- False! Just need to handle multiple components—run traversal from each unvisited vertex.
“DFS discover/finish times aren’t useful”
- False! Finish times enable topological sort, SCC computation, cycle detection, and more. They’re incredibly powerful.
“Back edges mean cycles”
- True for undirected graphs (excluding parent edges) and directed graphs. But forward/cross edges don’t necessarily imply cycles.
Traditional (banking model):
“BFS uses a queue. DFS uses a stack or recursion. Here’s the pseudocode. BFS runs in O(V+E). Memorize this. Apply to homework graphs.”
Problem-posing approach:
“You’re exploring a maze. What strategy ensures you find the exit without infinite loops? Let’s try going as far as possible first—what data structure tracks where we’ve been? What about exploring nearby options before going far—does that give us anything special? Ah! This finds the shortest path! Now we’ve discovered BFS and DFS. When would you use each one?”
The difference: Students discover why different traversal strategies exist by exploring what each guarantees. They understand the tradeoffs (space, path length, structure discovery) and develop intuition for which to use when. The algorithms emerge as natural solutions to exploration problems, not arbitrary procedures.
Additional problem-posing element: Present broken code or counterintuitive results:
“This BFS code finds paths but they’re not shortest—why?” (Answer: Didn’t mark vertices as visited immediately upon enqueueing)
Learning from debugging reinforces understanding.
Curriculum placement:
- Week 3-5 for graduate courses (after basic algorithms, before advanced graph topics)
- Week 4-6 for advanced undergraduate
Prerequisites check: Students should understand:
- Graph terminology and representations
- Recursion (for DFS)
- Queues and stacks
- Big-O analysis
Why this timing: Graph traversal must come before:
- Shortest paths (Dijkstra, Bellman-Ford)
- Network flows
- Most advanced graph algorithms
It’s foundational—virtually every graph algorithm uses BFS or DFS as a subroutine.
Pedagogical note: Start with BFS (easier to visualize, more intuitive queue structure). Then DFS (requires recursion/stack understanding). Show concrete applications (connectivity, shortest paths) before moving to abstract properties (discover/finish times, edge classification). Build to topological sort as a capstone application showing the power of DFS properties.
Ready for students? ✓ Yes - fundamental algorithms with clear motivation, strong real-world connections, visual intuition, and natural progression to advanced applications.
