Three-Vertex Induced Subgraph Isomorphism in DAGs


Problem Setting

This blog post is about a simple observation I stumbled upon recently. It contrasts the time complexity of two, at first glance, very similar computational tasks on directed acyclic graphs (DAGs). On the one hand, the problem of checking whether a DAG is transitive, i.e., whether abca \rightarrow b \rightarrow c implies aca \rightarrow c and, on the other hand, the problem of checking whether a DAG is moral1, that is abca \rightarrow b \leftarrow c implies that aa and cc are adjacent. The latter notion stems from the graphical models literature. For an example, consider the following graphs:

Basic examples of moral and transitive graphs.

The graph on the left is not transitive, for example due to acda \rightarrow c \rightarrow d, while acda \rightarrow c \leftarrow d implies non-morality of the right graph, in both cases because aa and dd are non-adjacent.

As the example illustrates, the two problems can be expressed as induced subgraph isomorphism problems, that is detecting whether a graph GG contains another (in this case fixed) graph HH as an induced subgraph2: We have non-transitivity iff Htrans=abcH_{\text{trans}} = a \rightarrow b \rightarrow c and non-morality iff Hmoral=abcH_{\text{moral}} = a \rightarrow b \leftarrow c is an induced subgraph. Surprisingly, and the main point of this article is that computationally these two problems behave differently:

Main Observation:
Checking morality of a graph is (easily) possible in linear-time in the size of the graph, while no linear-time algorithm is known for checking transitivity (and such an algorithm appears firmly out-of-reach).

In the following, I will give evidence for this statement. Before we start a short disclaimer: Most (if not all) results I present here are already known, though somewhat spread out over the literature. I give references whenever possible, in case I missed something relevant, feel free to contact me :)

Undirected Graphs

Before we dive into it, let’s briefly revisit the more widely studied problem of induced subgraph isomorphism for undirected graphs. Here, the phenomenon, that different choices of HH with the same number of vertices, lead to different time complexities is well-documented. Consider the following two graphs: Triangle graph. Here, the subgraph isomorphism problem is hard in case of HtriangleH_{\text{triangle}} in the sense that no linear-time algorithm is known. In fact, the only known subcubic algorithms are based on fast matrix multiplication, which imply that, at least in theory, the problem is solvable in time O(n2.373)O(n^{2.373}) (however these algorithms are not practical). Generally, it is conjectured that there can be no O(m4/3δ)O(m^{4/3-\delta}) algorithm for any δ>0\delta > 0, which is known as the sparse triangle hypothesis (see Wikipedia for a more detailed overview). Conversely, for HpathH_{\text{path}} the problem is almost trivially solvable in linear-time O(n+m)O(n+m) due to the following lemma.

Lemma (Folklore)
Let GG be a connected undirected graph. Then, GG contains an induced subgraph abca - b - c if, and only if, it is not complete (i.e. fully connected/a clique).

Proof.
We show two directions. If GG contains an induced subgraph abca - b - c, then it is not complete. Conversely, assume GG is not complete. Hence, there exist xx and yy, which are non-adjacent. Note that there exists a shortest path from xx to yy by connectedness of GG. Clearly, this path is induced, else it would not be shortest. Hence, either we have xzyx - z - y for some zz or there exists such a triple on the path. \square

It immediately follows that, to detect whether a graph contains an induced subgraph abca - b - c, it suffices to find the connected components of GG and check whether each of them is a clique (this can be done by counting edges).

Indeed, this observation completes the picture for three-vertex undirected graphs HH as the remaining graphs are analogues of the discussed cases obtainable by changing edges into non-edges and vice versa (complement graphs). Interestingly, similar (more nuanced) effects also appear on four vertices as discussed in detail by Williams, Wang, Williams, Yu (2015), which studies all possible cases in this setting.

We will now get back to DAGs. The two induced subgraphs HtransH_{\text{trans}} and HmoralH_{\text{moral}}, on the surface, both look similar to the case Hpath=abcH_{\text{path}} = a - b - c for undirected graphs, which would be obtained when ignoring edge directions. The following shows that directions matter.

Triangle Hardness of Checking Transitivity

We begin by showing that detecting induced subgraph abca \rightarrow b \rightarrow c is at least as hard as detecting a triangle in an undirected graph. This is possible by giving a reduction from the latter to the former problem. This reduction has been described e.g. in this StackExchange post.

We give a version of the reduction, which is a bit easier to illustrate, starting with the triangle detection problem on tripartite graphs. This is justified by the fact that it is as hard to find a triangle in a general graph, as it is to find one in a tripartite one (the reduction is straightforward and works by creating three copies of the initial vertex set).

Hence, let G=(V1V2V3,E)G = (V_1 \cup V_2 \cup V_3, E) be an instance of the triangle detection problem with partitions V1,V2,V3V_1, V_2, V_3. Then, construct a directed graph DD with the same vertex set and edges

  • u1v2u_1 \rightarrow v_2 if u1v2Eu_1 - v_2 \in E,
  • u2v3u_2 \rightarrow v_3 if u2v3Eu_2 - v_3 \in E and
  • u1v3u_1 \rightarrow v_3 if u1v3∉Eu_1 - v_3 \not\in E,
    where uiu_i, respectively viv_i, are assumed to be from partition ViV_i. Here is an illustration of this reduction:

Reduction from Triangle Detection to Transitivity.

Observe that the resulting graph is acyclic by definition. Moreover, if GG contains a triangle on vertices uu, vv and ww, then by construction DD contains u1v2w3u_1 \rightarrow v_2 \rightarrow w_3 with u1u_1 and w3w_3 non-adjacent. Conversely if DD contains an induced subgraph uvwu \rightarrow v \rightarrow w, then observe that u,v,wu, v, w are in different copies of VV (this is crucial). It follows that GG contains a triangle and thus:

Lemma
If there is an O(n2+k)O(n^{2+k}) algorithm for checking DAG transitivity, then triangle detection in undirected graphs can be solved in time O(n2+k)O(n^{2+k}).

Proof. Construct graph DD in time O(n2)O(n^2). Then, run the O(n2+k)O(n^{2+k}) algorithm for DAG transitivity to obtain the answer (as argued above). \square

Linear-Time Algorithm for Checking Morality

In contrast, checking whether a graph is moral is possible in linear-time. The algorithm given below is based on the one for checking whether an ordered graph has zero fill-in3 given here. Simplified and slightly adapted to our setting, it proceeds as follows:

  1. Compute a topological ordering of the vertices of GG (that is a linear ordering of the vertices such that for every edge uvu \rightarrow v it holds that uu precedes vv in the ordering).
  2. For each vertex vv, compute the vertex uu with uvu \rightarrow v such that uu is closest to vv in the topological ordering. Denote this vertex as C(v)C(v).
  3. For each vertex vv and every uvu \rightarrow v, check that uu and C(v)C(v) are adjacent (for uC(v)u \neq C(v)). If this is not the case, terminate and return “not moral”.
  4. Return “moral”.

Lemma [Rose, Tarjan, Lueker (1976); Tarjan, Yannakakis (1984); Whitten (1978, unpublished)]
Let GG be a DAG. Testing whether GG is moral can be done in linear time in the size of GG (that is O(n+m)O(n+m), when GG has nn vertices and mm edges).

Proof.
For the time complexity, observe that the algorithm performs at most one adjacency check per edge. We assume here that adjacency tests are possible in constant time4.

We now show correctness. Assume the algorithm returns not moral. Then, there exists uvC(v)u \rightarrow v \leftarrow C(v) with uu and C(v)C(v) non-adjacent in GG and thus the output is correct. Conversely, assume GG is not moral. Then, there exist non-adjacent vertices uu and ww (w.l.o.g. assume uu precedes ww in the topological ordering computed by the algorithm) such that there is a vertex vv with uvu \rightarrow v and wvw \rightarrow v. Choose a triple such that the vertices ww and vv are closest of all “non-moral” triples. Clearly, the algorithm outputs not moral if w=C(v)w = C(v). For this not to be the case, there would need to be a different vertex w=C(v)w' = C(v). It would be adjacent to uu and ww by assumption that ww is closest to vv of all vertices participating in immoral triples. However, then uwwu \rightarrow w' \leftarrow w would be a non-moral triple with ww closer to ww' than to vv. A contradiction. \square

Another line of argument is that, when imagining that the algorithm iterates over the topological ordering from left to right, we can assume by induction, when handling vertex vv and assuming the algorithm has not terminated yet, that the parents of C(v)C(v) are moral. Then, it is clear that it suffices to check whether all other parents of vv are connected to C(v)C(v) to certify that the parents of vv are fully connected as, by induction hypothesis, the parents of C(v)C(v) are. This is illustrated in the following picture where p1p_1, p2p_2, p3p_3 are fully connected by induction as parents of C(v)C(v).

Visualization of Induction Argument for Correctness.

Further Remarks

To reflect why checking morality is easier than checking transitivity, let us try to apply the same reduction from triangle detection. Hence, we construct, in similar spirit as above, for undirected graph G=(V1V2V2,E)G = (V_1 \cup V_2 \cup V_2, E), a DAG DD with the same vertex set and edges

  • u1v2u_1 \rightarrow v_2 if u1v2Gu_1 - v_2 \in G,
  • u2v3u_2 \leftarrow v_3 if u2v3Gu_2 - v_3 \in G and
  • u1v3u_1 \rightarrow v_3 if u1v3∉Gu_1 - v_3 \not\in G.

By definition the graph is acyclic and, as before, a triangle in GG will lead to a uvwu \rightarrow v \leftarrow w in DD. The issue lies in the reverse direction. If we find uvwu \rightarrow v \leftarrow w in DD, it might be that uu and ww are both in V1V_1. Then, this does not imply a triangle in GG. See the following example as illustration:

Incorrect Reduction from Triangle Detection to Morality.

Here, GG does not contain a triangle, however DD contains an immorality (marked in orange). As the algorithm above suggests, it is likely not possible to get such a reduction to work. Intuitively, the reason is that abca \rightarrow b \rightarrow c allows us to distinguish between aa and cc as one is the child and the other the parent of bb, whereas for abca \rightarrow b \leftarrow c vertices aa and cc cannot be distinguished which makes the problem combinatorially easier. It would be interesting to further formalize this argument, e.g., for larger graphs HH.

Another question is what happens when dropping the acyclicity requirement? The linear-time algorithm for checking morality explicitly uses the topological ordering of the vertices and thus cannot be used anymore. If the graph is cyclic (let’s say there is still only one edge between any pair of vertices, such graphs are known as oriented graphs), it is unclear whether it is possible to solve the problem in linear-time. I tend to think it has similar hardness to triangle-detection, but it is not directly clear how to prove this by reduction. I would be very interested if you have any ideas how to tackle this question :)

Citation

You can cite this blog post as:

@article{wienoebst2023subgraphisomorphism,
  title = {Three-Vertex Induced Subgraph Isomorphism in DAGs},
  author = {Wien{\"o}bst, Marcel},
  journal = {Blog post at mwien.github.io},
  year = {2023},
  howpublished = {\url{mwien.github.io/blog/subgraph-isomorphism-dags/}}
}

References

Rose, D. J.; Tarjan, R. E.; and Lueker, G. S. 1976. Algorithmic Aspects of Vertex Elimination on Graphs. In SIAM Journal on Computing 5(2).
Tarjan, R.E.; and Yannakakis, M. 1984. Simple linear-time algorithms to test chordality of graphs, test acyclicity of hypergraphs, and selectively reduce acyclic hypergraphs. In SIAM Journal on computing, 13(3).
Whitten, G.. 1978. Presentation at the SIAM Symposium on Sparse Matrix Computations.
Williams, V. V.; Wang, J. R.; Williams, R.; and Yu, H. 2015. Finding Four-Node Subgraphs in Triangle Time. In Proceedings of the 2015 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA).

Footnotes

Footnotes

  1. The name moral graph alludes to the fact that parents of a vertex need to be adjacent, that is married. As often, this historical name is less-than-ideal in hindsight. I still decided to keep it so that this blog post is google-able.

  2. Induced means that the subgraph of GG needs to have the same non-edges as HH as well. See this article as a reference for the considered problem.

  3. We will not go into more details here, but essentially, for a DAG GG, the undirected graph obtained by ignoring edge directions of GG has zero fill-in with regard to any topological ordering of GG if, and only if, GG does not contain abc a \rightarrow b \leftarrow c as induced subgraph. Hence, any algorithm for testing zero fill-in directly translates to one for checking morality.

  4. For sparse graphs, adjacency checks can be performed in expected constant time using O(n+m)O(n+m) space when storing the neighbors of each vertex in a hash-table. Adjacency tests can also be avoided altogether, yielding worst-case O(n+m)O(n+m) time (see e.g. the “Test For Zero Fill-In” algorithm in Tarjan, Yannakakis [1984]). I chose the presentation above for clarity.