Introduction
Depth-First Search or DFS in short, is a common graph traversal algorithm over graphs.
Imagine that you are digging a hole in the ground. You could dig deep first and then widen the hole or do the opposite. Unlike the BFS algorithm, DFS uses the “depth-first” approach, meaning we dig deep first, and then try to widen.
If we try to think in terms of graphs, this means we are recursively iterating deeper and deeper into a vertexes neighbor, then it’s child, and so on, and only after going the full depth, we are backtracking and moving on to a different descendant.
Algorithm
DFS can be implemented recursively by calling a neighbor of a vertex in each iteration, or implemented iteratively using a stack.
In the end, both approaches use a stack under the hood. The recursive approach has the recursive calls in the stack trace, where we have one function call for each vertex in our current traversal path. The iterative approach will at any given time, keep the vertexes in the traversed path in the self-allocated stack.
Both approaches also keep a visited set to know which vertices have already been traversed in order to prevent an infinite loop.
Code
Recursive Approach
def dfs(graph: List[List[int]], start_vertex: int, visited):
visited.add(start_vertex)
for neighbor in graph[start_vertex]:
if neighbor not in visited:
dfs(graph, neighbor, visited)
We assume the recursive function gets an empty visited set as a parameter.
Iterative Approach
def dfs_iterative(graph: List[List[int]], start_vertex: int):
visited = set()
stack = [start_vertex]
while stack:
curr_vertex = stack.pop()
for neighbor in enumerate(graph[curr_vertex]):
if neighbor not in visited:
visited.add(neighbor)
stack.append(neighbor)
Runtime Complexity
Runtime Complexity: O(V+E)
The algorithm goes over every vertex in the graph, and to do so uses every edge in the graph.
Algorithm Output
A Forest of Trees
The output of the DFS algorithm is a forest (a collection of tree graphs) that represent the the graph’s exploration during runtime. We run DFS starting from some vertex in the graph, and this run discovers a set of nodes, which we consider the first tree. (If the entire graph is reachable this will be the only tree and the algorithm is done).
Then the algorithm will run again but from a vertex that it didn’t encounter yet, and get another set of discovered vertices. This is the second tree.
This process is done over and over until the entire graph is discovered, and the result is the forest.
Different Edge Types
We categorize every edge in the forest into one of 4 edge types:
Tree Edge
An edge that “discovered” a vertex for the first time during the run. If v_1 \rightarrow v_2 is an edge such that v_1 discovered v_2, we say the edge v_1 \rightarrow v_2 is a tree edge.
Forward Edge
An edge from a vertex to a descendant which we already discovered while still running on the same output tree.
To understand how this happens, let’s consider the following example:
G=(V,E)
V=\{v_1,v_2,v_3\}
E=\{v_1 \rightarrow v_2, v_1 \rightarrow v_3, v_2 \rightarrow v_3\}
- We traverse through the path v_1 \rightarrow v_2 \rightarrow v_3.
- We backtrack back to v_1.
- We discover v_3 through the edge v_1 \rightarrow v_3, which is now a forward edge.
Back Edge
An edge from a descendant of an ancestor vertex back to the ancestor, while still running on the same output tree.
Let’s look at an example:
G=(V,E)
V=\{v_1,v_2,v_3\}
E=\{v_1 \rightarrow v_2, v_2 \rightarrow v_3, v_3 \rightarrow v_1\}
- We traverse through the path v_1 \rightarrow v_2 \rightarrow v_3
- katex]v_3[/katex] reaches v_1 again through the edge v_3 \rightarrow v_1
- Because we already encontered v_1 and are still part of the same DFS tree, the edge v_3 \rightarrow v_1 is considered a back edge.
Cross Edge
Edges that encounter a previously visited vertex during the run of a different DFS tree. They are called cross edges because they cross trees.
Detecting Cycles with Back Edges
How would you detect a cycle in a directed graph using DFS? We already learned one way to do this using topological sort but classic DFS can also work.
First we have to understand that just tracking all the vertices that we encountered and checking if we encounter a vertex twice is not enough. Why is that?
Consider the forward edge example we showed above, we reach v_3 a second time after the backtracking but there is no cycle in the graph.
This leads us to an interesting property about back edges:
A graph G has a cycle if and only if it has a back edge when running DFS on it.
How do we detect back edges? Here’s a code example:
def has_cycle(graph: List[List[int]], curr_vertex, visited, stack):
visited.add(curr_vertex)
stack.add(curr_vertex)
for neighbor in graph[curr_vertex]:
if neighbor in stack:
stack.remove(curr_vertex)
return True
if neighbor not in visited:
if has_cycle(graph, neighbor, visited, stack):
stack.remove(curr_vertex)
return True
stack.remove(curr_vertex)
return False
We are keeping 2 set data structures:
- A visited set – This is only to make sure we are not running an infinite loop.
- “stack” set – This set has all the vertices that are currently in the recursion stack. Notice that we are adding to it at the start of the function but removing from it every time we return from it. This tracks all the vertices that are in the path that we went through to the current vertex.
If we encounter a vertex that we already have in the stack, this means we encountered a back edge. The stack tells us the vertex was part of the path “up to this point” and then the edge to it (the back edge) closes the cycle.
Example Interview Questions
Find Path with Sum in a Tree
Given a binary tree graph and a target sum number, return whether there is a path from the root to a leaf with the target sum.
def is_leaf(node):
return node.left is None and node.right is None
def has_path_with_sum(root, target_sum):
if root is None:
return False
if is_leaf(root):
return target_sum == root.val
return has_path_with_sum(root.left, target_sum - root.val) or \
has_path_with_sum(root.right, target_sum - root.val)
Solution:
We can use DFS to traverse the tree (Moving to a left or right child each time) and keep track of the summary that still need to encounter in entire path from the root “up to this point”.
We do this by reducing the value in the target sum variable with each recursive call.
Once we get to a leaf, we can check if we have reached the exact needed number.
Runtime Complexity: O(V)
We know the traversal has a time complexity of O(V+E). Because we are dealing with a tree in this question E=O(V).
Can Tasks With Dependencies Be Completed?
We are given a list of tasks that need to be done T_1, T_2, T_3, ..., T_n with m dependencies between them. We are given the dependencies as a list of tuples. The tuple (T_i, T_j) means T_i has to be completed before completing T_j.
Return whether the tasks can be completed given the dependencies.
Solution:
This question is an easier variation of the question we saw in the Topological Sort article, but this time we only need to check if the tasks can possibly be completed, without finding the completion order.
We suggest you go over the question in the article, but the solution is almost the same. We build a graph where the tasks are the vertexes and the edges are the dependencies between them. This means an edge from T_i to T_j exists only if and only if T_i needs to be completed before completing T_j.
When can the tasks be completed? Only if there is no cycle in the graph. We can run the DFS algorithm and check for back edges to detect a cycle. We return true if and only if there is no cycle.
Note that this doesn’t give us the actual order needed for the completion (unlike the topological sort) but only checks if it is possible.
def build_graph(n: int, dependencies: List[List[int]]):
graph = [[] for _ in range(n)]
for source, dest in dependencies:
graph[dest].append(source)
return graph
def can_complete_tasks(num_tasks: int, dependencies: List[List[int]]) -> bool:
graph = build_graph(num_tasks, dependencies)
visited = set()
stack = set()
for i in range(num_tasks):
if i in visited:
continue
if has_cycle(graph, i, visited, stack):
return False
return True
We run DFS over the entire graph and check for cycles. Note that we are marking every visited vertex so there is no repeated unnecessary run for the algorithm.
Runtime Complexity: O(V+E) (For the traversal)