Despite its significance in the world of data science, many are still wondering what role do graph traversal and its algorithms play in solving computing problems.
Data science is considered one of the most complex fields of studies today. On the Internet alone, there is tons of information stored in different data centers across the globe.
Just imagine the amount of effort needed to search or sort through these databases.
Have you ever wonder how Facebook and Google come up with a set of recommendations when you’re using their platform?
Did it ever cross your mind how social media sites choose the list of people that they suggest for you to add or connect with?
Or, have you wondered how search engines choose the sites to show you after you submit a query?
These recommendations are made possible by algorithms that work together to sort, search, and rank data.
They scour the Internet or the platform’s database to find connections between different information. After sorting and classifying tons of stored data based on their relevance to the query, the recommendations will be formulated.
If you’re a social-media-type person, you must have noticed that Facebook, LinkedIn, Instagram, and other social networking platforms have these built-in recommendation features.
Now and then, you will see these suggestions:
- who to follow,
- who to add as a friend, or
- what pages to like
popping up in your news feeds.@Facebook and other social media networks turn to graph traversal and algorithms to provide better recommendations to their users. This includes people to add or pages to follow.Click To Tweet
Primarily, these suggestions are based on the mutual friends and likes of the people within a network. For instance, LinkedIn may recommend that you connect with user ‘A’ even if you don’t know him personally because you’re both connected with user ‘B’.
Social media networks utilize sophisticated algorithms to determine the degree of connection between you and other people.
In LinkedIn’s case, your 1st-degree connections are those people to whom you are already connected. This means that you have already accepted their invitation to connect or vice versa.
Then, your 2nd-degree connections refer to people who are not directly related to you. However, they are related to one or more of your 1st-degree connections.
This approach makes it easier to determine the degree of connection between different users in a social media platform.
If you want to know the degree of connections between a few people within your family or close circle of friends, you may do so using a pen and paper. However, when you’re dealing with billions of people in one network, it is virtually impossible to do it manually.
This is where algorithms and graph traversal comes into play.
In computer science, a graph is defined as a data structure that links a set of vertices or nodes by a set of edges. They are usually used in modeling real-world problems. For instance, a graph may represent Facebook or LinkedIn’s billions of users that are connected by friends, pages, or groups.
This is how a simple friend connection may be represented in a graph.
In this indirect graph, the vertices labeled with letters A to G represent the people within a social networking circle. The lines or the edges represent the connections between the vertices. To recommend people that you can add or follow, the social platform must know your degree of connections with these people through paths.
Let’s say we want to know the degree of connection between A and C in the graph. As you can see, it will provide two different paths:
- A -> B -> C
- A -> G -> F -> E -> D -> C
Now, the purpose of graph traversal is to allow algorithms to find these paths and explore the vertices. That means from the source vertex to the destination vertices. In our simple graph, we were able to pinpoint the two paths quickly. Easy, right?
Will this work when we’re dealing with billions of people? Of course!
However, it would require a more complex graph that may contain multiple paths between the source vertex and the destination vertex. So, if Facebook has around 2 billion users, that means 2 billion vertices and 2 billion edges.
Can’t wrap your head around how huge that is? Just look at the image below showing up to a hundred Facebook connections.
What is a Depth First Search Algorithm?
Depth First Search (DFS) is a tree-based graph traversal algorithm that is used to search a graph or data structure. As mentioned earlier, most problems in computer science can be thought of in terms of graphs where a DFS algorithm can be used to analyze and solve them.
When traversing a graph, a DFS algorithm always starts at the source vertex of a tree then goes down a single path to reach the destination vertex. To give you a better view of how it works, imagine that you’re trying to solve a maze like the one below.
Technically, there’s only one entry and one exit point in a maze like this. However, from the entrance to the exit point, there could be multiple routes that you can take to solve it. Typically, the most practical approach to solve the problem is to try all the paths, right?
So, from the starting point, you will take one path until you reach the end. If it’s a dead end, you would then try another route. You would backtrack to a point where you made one of the choices earlier and try another one.
You will do this several times until you reach the exit point, thus solving the problem.
From an algorithmic perspective, the processes you used to solve the maze are commonly referred to as backtracking and recursion. These are two significant components of the graph traversal algorithm, Depth First Search.
Going back to social media’s recommendation feature, the DFS algorithm can be used to know if a connection does or does not exist between people or vertices. It does this by going down a single path from the source vertex. When it reaches a dead end after exploring all vertices, it will backtrack to a point in the graph where it can choose another path to take.
This will continue until the DFS algorithm has traversed all vertices and paths. In data science, this is called recursion.
During graph traversal, the Depth First Search algorithm keeps track of the vertices that it has already visited by marking them. This avoids redundancy and re-visiting vertices.
In essence, a DFS algorithm is not really helpful in finding the shortest path between vertices in a graph. However, if the graph is massive or deep enough, it is considered the perfect choice. This is because this approach won’t require you to store the massive process in a memory storage.
Not only is it applicable in social media networks, it plays a vital role in search engines as well.
Still having a hard time understanding DFS algorithm? Don’t fret!
On our next article for this three-part series, we will be diving deeper into Depth First Search. Stay tuned as we will be sharing more about this graph traversal algorithm and its sibling, the Breadth First Search!