On the Computational Complexity of Graph Moralization


Introduction

This blog post discusses the computational complexity of graph moralization. The main point is to show that this problem is computationally equivalent to Boolean matrix multiplication (BMM). This refutes previous claims that moralization of directed acyclic graphs (DAGs) can be performed in time O(n2)O(n^2), with nn being the number of vertices, as this would necessitate a major algorithmic breakthrough. The only truly subcubic algorithms known for BMM are based on fast matrix multiplication, which is still firmly away from quadratic time.

Graph moralization is an important primitive in the field of probabilistic and causal graphical models. Given a DAG, it adds an edge between non-adjacent vertices xx and yy if they have a common child zz and afterwards makes all edges undirected.

Basic examples of moralization.

Boolean matrix multiplication is defined for matrices P{0,1}n1×n2,Q{0,1}n2×n3P \in \{0,1\}^{n_1 \times n_2}, Q \in \{0,1\}^{n_2 \times n_3} as

(PQ)ij=k=1n2(PikPkj), (P\cdot Q)_{ij} = \bigvee_{k = 1}^{n_2} (P_{ik} \land P_{kj}),

that is it can obtained from the standard matrix product by replacing addition with logical or and multiplication with logical and. A basic example is:

(101010110)(100010011)=(111010110)\begin{pmatrix} 1 & 0 & 1 \\ 0 & 1 & 0 \\ 1 & 1 & 0 \end{pmatrix} \cdot \begin{pmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 1 & 1 \end{pmatrix} = \begin{pmatrix} 1 & 1 & 1 \\ 0 & 1 & 0 \\ 1 & 1 & 0 \end{pmatrix}

Moralization transforms a directed to an undirected graphical model and has two main applications:

  • To perform inference in Bayesian networks, it is common to use moralization to obtain an undirected graph, which is then amenable to the toolbox of inference algorithm for this model class, e.g., the junction tree algorithm. This undirected graph has the property that it encodes the largest possible subset of conditional independencies of the directed model.
  • Moralizing a certain subgraph of a DAG allows for testing d-separation (i.e., implied conditional independence statements of a Bayesian network) through the conceptually simpler (vertex) separation in undirected graphs. This fact has been used for finding (minimal and minimum) d-separating sets as it allows to use well-studied algorithms available for undirected graphs.

The time complexity of computing the moralized graph has not been discussed in much depth, likely due to its apparent simplicity and because, e.g., when used in exact inference routines, which have exponential run-time on general graphs, it is usually not the bottleneck. However, in the context of testing and finding d-separators in DAGs, it usually is the most expensive part of the algorithm as the remaining steps are simple to perform on undirected graphs and often take linear time in the number of vertices and edges.

In the corresponding literature, the time complexity of moralization is predominantly stated as O(n+m)O(n+m'), with mm' being the number of edges in the output graph, or as O(n2)O(n^2) in terms of only nn. The first such reference is given in Geiger, Verma, Pearl (1990) and later repeated in Tian, Paz, Pearl (1998), Javidian, Valtorta (2018a), Javidian, Valtorta (2018b) and Javidian, Valtorta, Jamshidi (2020), which use moralization for finding minimal d-separators, as well as Textor, Liskiewicz (2011), van der Zander, Liskiewicz, Textor (2014), van der Zander, Liskiewicz, Textor (2019) and Jeong, Tian, Bareinboim (2022), which use it in the context of computing back-door respectively front-door adjustment sets. However, this run-time analysis is incorrect and off by a factor of nn.1

It should be noted that, for the majority of these applications, the state-of-the-art algorithms given by van der Zander, Liskiewicz (2020) and Wienöbst, van der Zander, Liskiewicz (2022) do not rely on moralization and are hence not affected.

Moreover, there are works which correctly analzye the moralization time-complexity such as Xia, Prasanna (2007), An, Cercone (2009), Xiang (2002) and Heisterkamp (2009). In the last work, it is shown that graph moralization can be directly solved by BMM. Let’s start by revisiting this result.

Solving Graph Moralization by BMM

Heisterkamp (2009) states that graph moralization is possible through BMM. More formally:

Theorem 1 [Heisterkamp (2009)]
Let GG be a DAG and there exist an O(T(n))O(T(n)) time algorithm for BMM for square matrices of dimension nn. Then, the graph GG can be moralized in time O(T(n))O(T(n)).

Proof.
Let AA be the adjacency matrix of GG, i.e., A[i,j]=1A[i,j] = 1 if v1vjv_1 \rightarrow v_j in GG. Add all edges vivjv_i - v_j to GG, where the matrix product C=AATC = A \cdot A^T has C[i,j]=1C[i,j] = 1 and viv_i and vjv_j were previously non-adjacent (for iji \neq j). Afterwards, make the directed edges undirected to obtain the moralization of GG.

The correctness follows as C[i,j]=1C[i,j] = 1 precisely if there is a kk such that vivkv_i \rightarrow v_k and vkvjv_k \leftarrow v_j, as this corresponds to A[i,k]=1A[i,k] = 1 and AT[k,j]=1A^T[k,j] = 1.

Observe that T(n)Ω(n2)T(n) \in \Omega(n^2). Then, it is easy to see that (i) forming the adjacency matrix and its transpose, (ii) computing the matrix product and (iii) constructing the resulting graph can all be performed in O(T(n))O(T(n)). \square

For an example, consider the following illustration of this reduction.

Illustration of the Turing reduction from moralization to BMM as given by Heisterkamp (2009).

Computational Equivalence of Graph Moralization and BMM

To show computational equivalence, it hence remains to prove that BMM can be solved by graph moralization. The underlying reduction is straightforward, relying on the well-known interpretation of BMM in terms of tripartite graphs:

Theorem 2
Let P{0,1}nA×nB,Q{0,1}nB×nCP \in \{0,1\}^{n_A \times n_B}, Q \in \{0,1\}^{n_B \times n_C} be two boolean matrices and let there exist an O(T(n))O(T(n)) worst-case time algorithm for moralizing an nn vertex graph. Then, the boolean matrix product R=PQR = P \cdot Q can be computed in worst-case time O(T(nA+nB+nC))O(T(n_A + n_B + n_C)).

Proof.
Construct a graph GG with vertex set ABCA \cup B \cup C consisting of nAn_A, nBn_B and nCn_C vertices, respectively. Insert edges aibja_{i} \rightarrow b_{j} if P[i,j]=1P[i,j] = 1 and edges bicjb_{i} \leftarrow c_{j} if Q[i,j]=1Q[i,j] = 1. Note that this graph is by definition acyclic. Then, moralization will add an edge aicja_{i} - c_{j} if, and only if, R[i,j]=1R[i, j] = 1 as this only happens in case there is kk such that P[i,k]=1P[i, k] = 1 and Q[k,j]=1Q[k, j] = 1.

For the run-time analysis, consider the three steps of the sketched algorithm: (i) building graph GG, (ii) performing moralization of graph GG and (iii) constructing the resulting matrix RR.

As GG has nA+nB+nCn_A + n_B + n_C vertices, (ii) takes time O(T(nA+nB+nC))O(T(n_A+n_B+n_C)) by assumption. The time complexity of (i) can be bounded by O(nAnB+nBnC)O(n_A \cdot n_B + n_B \cdot n_C), which in turn is in O((nA+nB+nC)2)O((n_A + n_B + n_C)^2). Because, as before, O(T(n))O(T(n)) is necessarily in Ω(n2)\Omega(n^2) (every algorithm for moralization has to add Ω(n2)\Omega(n^2) edges in the worst-case), it follows that O((nA+nB+nC)2)O((n_A + n_B + n_C)^2) is in O(T(nA+nB+nC))O(T(n_A+n_B+n_C)). For (iii), similar arguments apply as it can be bounded by O(nAnC)O(n_A \cdot n_C). \square

The following figure illustrates this approach:

Illustration of the Turing reduction from BMM to moralization.

These techniques can be used and generalized to tackle related problems as well:

  • The problem of moralizing ancestral graphs, also known as the augmented graph (see, e.g. Section 3.2 in van der Zander, Liskiewicz, Textor (2019)), is also equivalent to BMM as can easily be proved by generalizing Theorem 1.
  • The same holds for the problem of computing the latent projection of a DAG with unobserved variables. This can be shown by reduction to and from the problem of computing the transitive closure of a DAG (which is well-known to be computationally equivalent to BMM). The reductions are straightforward and a nice exercise.

In contrast, the problem of detecting whether a graph is moral (that is whether the moralization procedure adds no edges) is possible in linear-time O(n+m)O(n+m) as discussed in this previous blog post.

Citation

You can cite this blog post as:

@article{wienoebst2023moralization,
  title = {On the Computational Complexity of Graph Moralization},
  author = {Wien{\"o}bst, Marcel},
  journal = {Blog post at mwien.github.io},
  year = {2023},
  howpublished = {\url{mwien.github.io/blog/complexity-moralization/}}
}

Acknowledgement

I would like to thank Benito van der Zander for his help in the literature search on the run-time of moralization.

References

An, X.; and Cercone, N. 2009. Compiling Multiply Sectioned Bayesian Networks: A Comparative Study. In Proceedings of the Eighth Mexican International Conference on Artificial Intelligence.
Geiger, D.; Verma, T.; and Pearl, J. 1990. d-separation: From theorems to algorithms. In Machine Intelligence and Pattern Recognition (10).
Heisterkamp, S. H. 2009. Directed acyclic graphs and the use of linear mixed models. Technical Report.
Javidian, M. A.; and Valtorta, M. 2018. Finding minimal separators in ancestral graphs. In Proceedings of the Seventh Causal Inference Workshop at the 34th Conference on Artifical Intelligence (UAI’18).
Javidian, M. A.; and Valtorta, M. 2018. Finding Minimal Separators in LWF Chain Graphs. In Proceedings of the Ninth International Conference on Probabilistic Graphical Models.
Javidian, M. A.; Valtorta, M.; and Jamshidi, P. 2020. AMP Chain Graphs: Minimal Separators and Structure Learning Algorithms. Journal of Artificial Intelligence Research 69.
Pearl, J.; and Verma, T. S. 1995. A theory of inferred causation. In Studies in Logic and the Foundations of Mathematics (134).
Richardson, T.; and Spirtes, P. 2002. Ancestral graph Markov models. The Annals of Statistics (30(4)).
Textor, J.; and Liśkiewicz, M. 2011. Adjustment criteria in causal diagrams: an algorithmic perspective. In Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence (UAI’11).
Tian, J.; Paz. A.; and Pearl, J. 1998. Finding Minimal D-separators. Technical Report.
van der Zander, B.; Textor, J.; and Liśkiewicz, M. 2014. Constructing Separators and Adjustment Sets in Ancestral Graphs. In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence (UAI’14).
van der Zander, B., Liśkiewicz, M., and Textor, J. 2019. Separators and adjustment sets in causal graphs: Complete criteria and an algorithmic framework. Artificial Intelligence (270).
van der Zander, B.; and Liśkiewicz, M. 2020. Finding minimal d-separators in linear time and applications. In Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI’20).
Jeong, H.; Tian, J.; and Bareinboim, E. 2022. Finding and listing front-door adjustment sets. In Proceedings of the Thirty-Fifth Conference on Advances in Neural Information Processing Systems (NeurIPS’22).
Wienöbst, M.; van der Zander, B.; and Liśkiewicz, M. 2022. Linear-Time Algorithms for Front-Door Adjustment in Causal Graphs. Preprint.
Xia, Y.; and Prasanna, V. K. 2007. Parallel Exact Inference. In Advances in Parallel Computing (15).
Xiang, Y. 2002. Probabilistic Reasoning in Multiagent Systems: A graphical models approach. Cambridge University Press.

Footnotes

Footnotes

  1. A naive implementation of moralization has worst-case run-time O(n3)O(n^3), despite being claimed as O(n2)O(n^2) in these works, hence the missing factor nn. Being more precise, because moralization is computationally equivalent to BMM, the time-complexity of moralization can be stated, at least theoretically, as O(n2.37286)O(n^{2.37286}), the current run-time of fast matrix multiplication, which is off from n2n^2 by a factor of merely n0.37286n^{0.37286}.