添加链接
link管理
链接快照平台
  • 输入网页链接,自动生成快照
  • 标签化管理网页链接
Abstract

Recent advancements in Graph Neural Networks have led to state-of-the-art performance on graph representation learning. However, the majority of existing works process directed graphs by symmetrization, which causes loss of directional information. To address this issue, we introduce the magnetic Laplacian, a discrete Schrödinger operator with magnetic field, which preserves edge directionality by encoding it into a complex phase with an electric charge parameter. By adopting a truncated variant of PageRank named LinearRank, we design and build a low-pass filter for homogeneous graphs and a high-pass filter for heterogeneous graphs. In this work, we propose a complex-valued graph convolutional network named M agnetic G raph C onvolutional network (MGC). With the corresponding complex-valued techniques, we ensure our model will be degenerated into real-valued when the charge parameter is in specific values. We test our model on several graph datasets including directed homogeneous and heterogeneous graphs. The experimental results demonstrate that MGC is fast, powerful, and widely applicable.

Machine Learning, ICML

High dimensional data are ordinarily represented as graphs in a variety of areas, such as citation, energy, sensor, social, and traffic networks. Consequently, Graph Neural Networks (GNNs), also known as CNNs on graphs, have shown tremendous promise.

Since the emergence of GCN (Kipf & Welling, 2017 ) , numerous GNNs have been developed to generalize convolution operations on graphs through the graph Fourier transform. Graph convolution is based on graph signal processing (Shuman et al., 2013 , 2016 ) , where the graph filter is a crucial component. A graph filter is a tensor which processes a graph signal by amplifying or attenuating its corresponding graph Fourier coefficients. Generally, a graph filter is a combination of different powers of a normalized adjacency matrix or a normalized Laplacian matrix.

The graph Fourier transform, which is based on computing the eigendecomposition of a graph filter, cannot be directly applied to a directed graph, since the adjacency matrix is asymmetric (Tremblay et al., 2018 ; Stanković et al., 2019 ) . A straightforward method is to symmetrize the adjacency matrix which treats a directed graph as an undirected graph. This operation is adopted by most of the existing GNNs. Unfortunately, it overlooks the asymmetric nature which leads to the situation that GNNs fail to capture the directional information on directed graphs. An alternative solution is Chung’s directed Laplacian (Chung, 2005 ) which is based on the transition probability matrix of out-degree and the corresponding Perron vector. It is one of the directed Laplacians which takes advantage of the directional information hidden behind the transition probability matrix of out-degree.

Figure 1: The visualization of the magnetic eigenmaps with the electric parameter q = 1 3 𝑞 1 3 q=\frac{1}{3} in Texas. Each node represents an eigenvector.

In this paper, we introduce another directed Laplacian named the magnetic Laplacian (Fanuel et al., 2017 , 2018 ) which utilizes the directional information from the adjacency matrix. As an extension of the combinatorial Laplacian (Chung, 1997 ) , the magnetic Laplacian preserves edge directionality by encoding it into a complex phase. We adopt the magnetic Laplacian for node classification in directed graphs. Figure 1 exhibits the visualization of the magnetic eigenmaps (Fanuel et al., 2018 ) , which are eigenvectors of the magnetic Laplacian, with the electric parameter q = 1 3 𝑞 1 3 q=\frac{1}{3} in Texas (Lu & Getoor, 2003 ) .

Recently, enlightened by spectral ranking (Vigna, 2016 ) , GNNs such as PPNP, APPNP (Klicpera et al., 2019a ) , GDC (Klicpera et al., 2019b ) , and S 2 GC (Zhu & Koniusz, 2021 ) demonstrate strong performance in node classification. In this research, we introduce a truncated variant of PageRank named LinearRank (Baeza-Yates et al., 2006 ) . By combining LinearRank with the renormalized adjacency matrix, we design a low-pass filter for homogeneous graphs. Furthermore, by associating LinearRank with the negative renormalized adjacency matrix, we design a high-pass filter for heterogeneous graphs. In short, we propose a complex-valued graph convolutional network named Magnetic Graph Convolutional network (MGC). The experimental results demonstrate that MGC is fast, powerful, and widely applicable.

Graph Filters and Spectral Ranking. Based on graph signal processing, (Defferrard et al., 2016 ) proposes a fast localized graph filter. ChebyNet (Defferrard et al., 2016 ) is the first GNN with graph filter modification, which utilizes the Chebyshev polynomials of the first kind. Then GCN (Kipf & Welling, 2017 ) improves ChebyNet by proposing a method called the renormalization trick which inspires several GNNs to design filters. As a simplified version of multi-layer GCN, SGC (Wu et al., 2019 ) eliminates nonlinear activation functions and Dropout in order to retain performance and achieve the same results as GCN.

Since graph convolution is related to graph signal processing, several researchers have tried to design GNNs from the view of devising graph filters. One common approach is focusing on spectral ranking (Vigna, 2016 ) . Personalized PageRank (Jeh & Widom, 2003 ) is a well-known spectral ranking algorithm adopted by PPNP and APPNP (Klicpera et al., 2019a ) . Subsequently, GDC (Klicpera et al., 2019b ) utilizes Personalized PageRank and Heat Kernel PageRank (Chung, 2007 ) to construct a general graph convolution framework. Afterwards, S 2 GC (Zhu & Koniusz, 2021 ) adopts Markov Diffusion kernel (Fouss et al., 2006 ) as its graph filter.

Chung’s Directed Laplacian. Chung’s directed Laplacian (Chung, 2005 ) is a Hermitian matrix based on the transition probability matrix of out-degree and the corresponding Perron vector. This Laplacian is applied for strongly connected graphs. (Ma et al., 2019 ) adopts this directed Laplacian and combines it with the renormalization trick to propose DGCN for directed graphs. The work in (Tong et al., 2020 ) inherits DGCN, and generalizes it for all kinds of graphs with PageRank (Page et al., 1999 ) and Personalized PageRank.

Magnetic Laplacian. The magnetic Laplacian is first proposed in (Shubin, 1994 ) . As a discrete Hamiltonian of a charged particle on a graph, the magnetic Laplacian is widely used in mathematics (de Verdière, 2013 ) and physics (Olgiati, 2017 ) . (Biamonte et al., 2019 ) indicates that the magnetic Laplacian is deeply related with Quantum walk. Since it is a complex Hermitian matrix, its eigenvalues are real and eigenvectors are orthonormal. Due to these properties, the magnetic Laplacian can be employed in graph representation learning for both directed and undirected graphs.

We are aware of a concurrently developed work: MagNet (Zhang et al., 2021 ) , which adopts the renormalized magnetic adjacency matrix as the graph shift operator (Stanković et al., 2019 ) to build a graph filter. It is composed of two layers of graph convolution for node classification. MagNet separates complex-valued tensors into the real and imaginary parts, each of which is processed by an independent real-valued neural network (RVNN). In the last layer, there is a concatenation operation to turn a complex-valued tensor into real-valued. To determine the electric parameter q 𝑞 q of the magnetic Laplacian, binary search is adopted.

Our model is different from MagNet in several ways. First, we utilize Johnson’s algorithm (Johnson, 1975 ) for determining q 𝑞 q . Second, we adopt LinearRank with the renormalized magnetic adjacency matrix for homogeneous graphs and the negative renormalized magnetic adjacency matrix for heterogeneous graphs. Third, we adopt a complex-valued graph convolution with corresponding weight initialization (Trabelsi et al., 2018 ) . Fourth, inspired by gating mechanism (Dauphin et al., 2017 ) , we design a complex-valued activation function to turn a complex-valued tensor into real-valued. Notice when q = 0 𝑞 0 q=0 or q = 1 2 𝑞 1 2 q=\frac{1}{2} , our model will be degenerated from a complex-valued neural network (CVNN) to an RVNN.

A directed graph G 𝐺 G is represented as G = { 𝒱 , } 𝐺 𝒱 G=\{\mathcal{V},\mathcal{E}\} , where 𝒱 = { v 0 , v 1 , , v n 1 } 𝒱 subscript 𝑣 0 subscript 𝑣 1 subscript 𝑣 𝑛 1 \mathcal{V}=\{v_{0},v_{1},...,v_{n-1}\} is a finite set of vertices with | 𝒱 | = n 𝒱 𝑛 |\mathcal{V}|=n , and 𝒱 × 𝒱 𝒱 𝒱 \mathcal{E}\subseteq\mathcal{V}\times\mathcal{V} is the set of edges. Let 𝐀 n × n 𝐀 superscript 𝑛 𝑛 \mathbf{A}\in\mathbb{R}^{n\times n} denote the directed adjacency matrix of G 𝐺 G , where 𝐀 ( u , v ) = 1 𝐀 𝑢 𝑣 1 \mathbf{A}(u,v)=1 if there is an edge from node u 𝑢 {u} to node v 𝑣 {v} , otherwise 𝐀 ( u , v ) = 0 𝐀 𝑢 𝑣 0 \mathbf{A}(u,v)=0 .

In graph theory, directed graphs can be divided into two categories: directed acyclic graphs and directed cyclic graphs. We define a cycle in a directed graph as a closed chain of distinct edges that connects a sequence of distinct nodes. If all the edges on a cycle are oriented in the same direction, it is called a directed cycle. If a directed graph has no directed cycle, we name it a directed acyclic graph. Otherwise, the directed graph is called directed cyclic graph.

We use G s = { 𝒱 , s } subscript 𝐺 𝑠 𝒱 subscript 𝑠 G_{s}=\{\mathcal{V},\mathcal{E}_{s}\} to represent an undirected graph. The undirected adjacency matrix can be described by 𝐀 s = 1 2 ( 𝐀 + 𝐀 T ) subscript 𝐀 𝑠 1 2 𝐀 superscript 𝐀 T \mathbf{A}_{s}=\frac{1}{2}(\mathbf{A}+\mathbf{A}^{\mathrm{T}}) or 𝐀 s ( u , v ) = max ( 𝐀 ( u , v ) , 𝐀 ( v , u ) ) subscript 𝐀 𝑠 𝑢 𝑣 max 𝐀 𝑢 𝑣 𝐀 𝑣 𝑢 \mathbf{A}_{s}(u,v)=\mathrm{max}(\mathbf{A}(u,v),\mathbf{A}(v,u)) , so that 𝐀 s subscript 𝐀 𝑠 \mathbf{A}_{s} is symmetric. For an undirected graph, the combinatorial Laplacian (Chung, 1997 ) can be defined as 𝐋 = 𝐃 s 𝐀 s 𝐋 subscript 𝐃 𝑠 subscript 𝐀 𝑠 \mathbf{L}=\mathbf{D}_{s}-\mathbf{A}_{s} , where 𝐃 s n × n subscript 𝐃 𝑠 superscript 𝑛 𝑛 \mathbf{D}_{s}\in\mathbb{R}^{n\times n} is the diagonal degree matrix of G s subscript 𝐺 𝑠 G_{s} .

Graphs can be either homogeneous or heterogeneous. The homophily and heterophily of a graph are used to describe the relation of labels among nodes. A homogeneous graph is a graph where the labels of all the nodes are consistent. On the contrary, in a heterogeneous graph, labels for nodes are of different types. The node homophily index for a graph (Pei et al., 2020 ) is denoted as node ( G ) = 1 | 𝒱 | u 𝒱 | { v | v 𝒩 u , 𝒴 v = 𝒴 u } | | 𝒩 u | subscript node 𝐺 1 𝒱 subscript 𝑢 𝒱 conditional-set 𝑣 formulae-sequence 𝑣 subscript 𝒩 𝑢 subscript 𝒴 𝑣 subscript 𝒴 𝑢 subscript 𝒩 𝑢 \mathcal{H}_{\mathrm{node}}(G)=\frac{1}{|\mathcal{V}|}\sum_{u\in\mathcal{V}}{\frac{\left|\{v|v\in\mathcal{N}_{u},\mathcal{Y}_{v}=\mathcal{Y}_{u}\}\right|}{|\mathcal{N}_{u}|}} , where 𝒩 u subscript 𝒩 𝑢 \mathcal{N}_{u} is the neighbor set of node u 𝑢 u and 𝒴 u subscript 𝒴 𝑢 \mathcal{Y}_{u} is the label of node u 𝑢 u . Note that node ( G ) 1 subscript node 𝐺 1 \mathcal{H}_{\mathrm{node}}(G)\rightarrow 1 indicates strong homophily and node ( G ) 0 subscript node 𝐺 0 \mathcal{H}_{\mathrm{node}}(G)\rightarrow 0 corresponds to strong heterophily.

In graph signal processing (GSP), a graph signal (Stanković et al., 2019 ) x 𝑥 x is a map from the set of vertices V 𝑉 V into the set of complex numbers \mathbb{C} as x : 𝒱 ; v n x n : 𝑥 formulae-sequence 𝒱 subscript 𝑣 𝑛 subscript 𝑥 𝑛 x:\ \mathcal{V}\rightarrow\mathbb{C};\ v_{n}\rightarrow x_{n} . For convenient mathematical representation, we denote graph signals as a column vector 𝐱 = [ x 0 , x 1 , , x n 1 ] T 𝐱 superscript subscript 𝑥 0 subscript 𝑥 1 subscript 𝑥 𝑛 1 T \mathbf{x}=[x_{0},x_{1},...,x_{n-1}]^{\mathrm{T}} .

A graph shift operator (GSO) (Stanković et al., 2019 ) is a matrix 𝐒 n × n 𝐒 superscript 𝑛 𝑛 \mathbf{S}\in\mathbb{C}^{n\times n} where 𝐒 ( u , v ) 0 𝐒 𝑢 𝑣 0 \mathbf{S}{(u,v)}\neq{0} if and only if u = v 𝑢 𝑣 u=v or ( u , v ) 𝑢 𝑣 (u,v)\in\mathcal{E} . A GSO defines how to shift a graph signal from one node to its neighbors based on the graph topology. More specifically, GSO is a local operator that replaces graph signal value of each node with linear combination of its one-hop neighbors’. In graph signal processing, it is common to take a normalized adjacency matrix or normalized Laplacian matrix as a graph shift operator.

A graph filter 𝐇 n × n 𝐇 superscript 𝑛 𝑛 \mathbf{H}\in\mathbb{C}^{n\times n} (Stanković et al., 2019 ) is a function h ( ) h(\cdot) of a graph shift operator, denoted as 𝐇 = h ( 𝐒 ) 𝐇 𝐒 \mathbf{H}=h(\mathbf{S}) . Apparently, a graph shift operator is a simple graph filter. In graph signal processing, it is common to utilize a polynomial graph filter which is defined as h ( 𝐒 ) = k = 0 K ζ k 𝐒 k 𝐒 subscript superscript 𝐾 𝑘 0 subscript 𝜁 𝑘 superscript 𝐒 𝑘 h(\mathbf{S})=\sum^{K}_{k=0}{{\zeta}_{k}\mathbf{S}^{k}} . This kind of graph filter is named Moving-Average (MA) filter (Isufi et al., 2017 ) , which is also named Finite Impulse Response (FIR) filter. The capacity of a GNN with an FIR filter is determined by the degree k 𝑘 k of a polynomial. We denote it as a MA K or FIR K filter. Compared with a FIR filter, an Infinite Impulse Response (IIR) filter is more specialized to capture global structure on a graph. A well-known IIR filter is called Auto-Regressive (AR) filter (Isufi et al., 2017 ) , which is defined as h ( 𝐒 ) = ( 𝐈 n + k = 1 K η k 𝐒 k ) 1 𝐒 superscript subscript 𝐈 𝑛 subscript superscript 𝐾 𝑘 1 subscript 𝜂 𝑘 superscript 𝐒 𝑘 1 h(\mathbf{S})=\left(\mathbf{I}_{n}+\sum^{K}_{k=1}{{\eta}_{k}\mathbf{S}^{k}}\right)^{-1} . We denote it as AR K filter.

By multiplying an AR K filter with an MA K filter, we can obtain an Auto-Regressive–Moving-Average (ARMA) filter (Isufi et al., 2017 ) of order K 𝐾 K , denoted as ARMA K . It is defined as h ( 𝐒 ) = ( k = 0 K ζ k 𝐒 k ) ( 𝐈 n + k = 1 K η k 𝐒 k ) 1 𝐒 subscript superscript 𝐾 𝑘 0 subscript 𝜁 𝑘 superscript 𝐒 𝑘 superscript subscript 𝐈 𝑛 subscript superscript 𝐾 𝑘 1 subscript 𝜂 𝑘 superscript 𝐒 𝑘 1 h(\mathbf{S})=\left(\sum^{K}_{k=0}{{\zeta}_{k}\mathbf{S}^{k}}\right)\left(\mathbf{I}_{n}+\sum^{K}_{k=1}{{\eta}_{k}\mathbf{S}^{k}}\right)^{-1} . With the same order K 𝐾 K , an ARMA K filter has better performance than an AR K filter or an MA K filter, since an ARMA K filter is composed of a feedforward term (MA) and a feedback term (AR). An ARMA K filter is a special case of an ARMA F,B filter as h ( 𝐒 ) = ( f = 0 F ζ f 𝐒 k ) ( 𝐈 n + b = 1 B η b 𝐒 k ) 1 𝐒 subscript superscript 𝐹 𝑓 0 subscript 𝜁 𝑓 superscript 𝐒 𝑘 superscript subscript 𝐈 𝑛 subscript superscript 𝐵 𝑏 1 subscript 𝜂 𝑏 superscript 𝐒 𝑘 1 h(\mathbf{S})=\left(\sum^{F}_{f=0}{{\zeta}_{f}\mathbf{S}^{k}}\right)\left(\mathbf{I}_{n}+\sum^{B}_{b=1}{{\eta}_{b}\mathbf{S}^{k}}\right)^{-1} .

A function h ( ) h(\cdot) of the set of the graph filter’s eigenvalues λ 𝜆 \lambda , h ( λ ) 𝜆 h(\lambda) is called the graph frequency response function (GFRF), which extends the convolution theorem from digital signal processing to graphs. For a normalized Laplacian matrix, its GFRF is h ( λ ) = λ 𝜆 𝜆 h(\lambda)=\lambda . For a normalized adjacency matrix, its GFRF is h ( λ ) = 1 λ 𝜆 1 𝜆 h(\lambda)=1-\lambda . In graph signal processing, the reason to choose a normalized Laplacian matrix as graph filter is because its graph filter is the simplest high-pass filter. Likewise, a normalized adjacency matrix is the simplest low-pass filter.

For an undirected graph G s subscript 𝐺 𝑠 G_{s} , its graph filter can be eigendecomposed as 𝐇 = 𝐔 𝚲 𝐔 𝐇 𝐔 𝚲 superscript 𝐔 \mathbf{H}=\bf{U}\bf{\Lambda}\mathbf{U}^{*} , where 𝐔 n × n 𝐔 superscript 𝑛 𝑛 \mathbf{U}\in\mathbb{R}^{n\times n} is a matrix of orthonormal eigenvectors, 𝚲 = diag ( [ h ( λ ( 0 ) ) , , h ( λ ( n 1 ) ) ] ) n × n 𝚲 diag superscript 𝜆 0 superscript 𝜆 𝑛 1 superscript 𝑛 𝑛 \mathbf{\Lambda}=\mathrm{diag}\left([h(\lambda^{(0)}),...,h(\lambda^{(n-1)})]\right)\in\mathbb{R}^{n\times n} is a diagonal matrix of filtered eigenvalues, and {*} means conjugate transpose. Since 𝐔 𝐔 \mathbf{U} is real-valued, we have 𝐔 = 𝐔 T superscript 𝐔 superscript 𝐔 T \mathbf{U}^{*}=\mathbf{U}^{\mathrm{T}} .

Based on the theory of graph signal processing, the graph Fourier transform for a signal vector 𝐱 𝐱 \mathbf{x} on an undirected graph G s subscript 𝐺 𝑠 G_{s} is defined as 𝐱 ^ = 𝐔 T 𝐱 ^ 𝐱 superscript 𝐔 T 𝐱 \hat{\mathbf{x}}=\mathbf{U}^{\mathrm{T}}\mathbf{x} , and the inverse graph Fourier transform is 𝐱 = 𝐔 𝐱 ^ 𝐱 𝐔 ^ 𝐱 \mathbf{x}=\mathbf{U}\hat{\mathbf{x}} . The convolution operator on graph 𝒢 subscript 𝒢 *_{\mathcal{G}} is defined as 𝐡 𝒢 𝐱 = 𝐔 ( ( 𝐔 T 𝐡 ) ( 𝐔 T 𝐱 ) ) = 𝐔 h ( 𝚲 ) 𝐔 T 𝐱 = 𝐇𝐱 = i = 0 n 1 h ( λ ( i ) ) 𝐮 i 𝐮 i T subscript 𝒢 𝐡 𝐱 𝐔 direct-product superscript 𝐔 𝑇 𝐡 superscript 𝐔 𝑇 𝐱 𝐔 𝚲 superscript 𝐔 T 𝐱 𝐇𝐱 superscript subscript 𝑖 0 𝑛 1 superscript 𝜆 𝑖 subscript 𝐮 𝑖 superscript subscript 𝐮 𝑖 T \mathbf{h}*_{\mathcal{G}}\mathbf{x}=\mathbf{U}\left((\mathbf{U}^{{T}}\mathbf{h})\odot(\mathbf{U}^{{T}}\mathbf{x})\right)=\mathbf{U}{h{(\mathbf{\Lambda})}}\mathbf{U}^{\mathrm{T}}\mathbf{x}=\mathbf{H}\mathbf{x}=\sum_{i=0}^{n-1}{h(\lambda^{(i)})\mathbf{u}_{i}\mathbf{u}_{i}^{\mathrm{T}}} , where 𝐡 𝐡 \mathbf{h} is a vector form of GFRF, h ( 𝚲 ) 𝚲 {h{(\mathbf{\Lambda})}} is a matrix form of GFRF, and 𝐮 0 , 𝐮 1 , , 𝐮 n 1 subscript 𝐮 0 subscript 𝐮 1 subscript 𝐮 𝑛 1 \mathbf{u}_{0},\mathbf{u}_{1},...,\mathbf{u}_{n-1} are eigenvectors of 𝐔 𝐔 \mathbf{U} .

For a directed graph, its adjacency matrix is asymmetric. Consequently, the combinatorial Laplacian is not straightforwardly applicable. To process a directed graph while preserving edge directionality, we introduce the magnetic Laplacian (Fanuel et al., 2017 , 2018 ) , which is a complex positive semi-definite Hermitian matrix. As a deformation of the combinatorial Laplacian, the magnetic Laplacian is defined as:

𝐋 q = 𝐃 s 𝐀 s 𝐓 q , 𝐓 q ( u , v ) = exp ( i 2 π q ( 𝐀 ( u , v ) 𝐀 ( v , u ) ) ) , formulae-sequence subscript 𝐋 𝑞 subscript 𝐃 𝑠 direct-product subscript 𝐀 𝑠 subscript 𝐓 𝑞 subscript 𝐓 𝑞 𝑢 𝑣 𝑖 2 𝜋 𝑞 𝐀 𝑢 𝑣 𝐀 𝑣 𝑢 \centering\begin{split}\mathbf{L}_{q}&=\mathbf{D}_{s}-\mathbf{A}_{s}\odot{\mathbf{T}_{q}},\\ \mathbf{T}_{q}{(u,v)}&={\exp}{\left(i2{\pi}q{\left(\mathbf{A}(u,v)-\mathbf{A}(v,u)\right)}\right)},\end{split}\@add@centering

where direct-product \odot represents the Hadamard product, 𝐓 q subscript 𝐓 𝑞 \mathbf{T}_{q} is a complex and unitary parallel transporter which represents the direction of magnetic fluxes among nodes, and q 𝑞 q is an electric charge parameter. The symmetric normalized magnetic Laplacian is defined as

q = 𝐃 s 1 2 ( 𝐃 s 𝐀 s 𝐓 q ) 𝐃 s 1 2 = 𝐈 n 𝐃 s 1 2 𝐀 s 𝐃 s 1 2 𝐓 q . subscript 𝑞 subscript superscript 𝐃 1 2 𝑠 subscript 𝐃 𝑠 direct-product subscript 𝐀 𝑠 subscript 𝐓 𝑞 subscript superscript 𝐃 1 2 𝑠 subscript 𝐈 𝑛 direct-product subscript superscript 𝐃 1 2 𝑠 subscript 𝐀 𝑠 subscript superscript 𝐃 1 2 𝑠 subscript 𝐓 𝑞 \centering\begin{split}\mathcal{L}_{q}&=\mathbf{D}^{-\frac{1}{2}}_{s}\left(\mathbf{D}_{s}-\mathbf{A}_{s}\odot{\mathbf{T}_{q}}\right)\mathbf{D}^{-\frac{1}{2}}_{s}\\ &=\mathbf{I}_{n}-\mathbf{D}^{-\frac{1}{2}}_{s}\mathbf{A}_{s}\mathbf{D}^{-\frac{1}{2}}_{s}\odot{\mathbf{T}_{q}}.\end{split}\@add@centering

Since q 𝑞 q has a high influence for the magnetic Laplacian, we should prudentially choose a specific value. Unfortunately, it is a tough issue because we only know that q 𝑞 q is closely related to reciprocity and directed cycles of a graph. To address this issue, (Fanuel et al., 2017 ) proposes a method which is q = 1 m 𝑞 1 𝑚 q=\frac{1}{m} , if there is a directed m 𝑚 m -cycle (where m 2 𝑚 2 m\geq 2 ) in a graph. Note that a reciprocal link is treated as a directed 2-cycle. q = 0 𝑞 0 q=0 for both directed acyclic graphs and undirected graphs. Besides, to avoid the emphasis of reciprocal links, q 𝑞 q should be restricted in [ 0 , 1 2 ] 0 1 2 [0,\frac{1}{2}] .

We introduce Johnson’s algorithm (Johnson, 1975 ) to determine directed cycles for a directed cyclic graph. The time complexity is O ( | 𝒱 | 2 log | 𝒱 | + | 𝒱 | | | ) 𝑂 superscript 𝒱 2 𝒱 𝒱 O\left(|\mathcal{V}|^{2}\cdot\log{|\mathcal{V}|}+|\mathcal{V}|\cdot|\mathcal{E}|\right) . Once we have detected all cycles, we will successively choose q 𝑞 q from a set of reciprocals of cycles’ length. Notice that there are three special cases when the magnetic Laplacian will be degenerated due to the value of the electric charge parameter: (1) q = 0 𝑞 0 q=0 , i.e., the combinatorial Laplacian. It means that the graph is an undirected graph or a directed acyclic graph. (2) q = 1 4 𝑞 1 4 q=\frac{1}{4} , i.e., the imaginary combinatorial Laplacian. (3) q = 1 2 𝑞 1 2 q=\frac{1}{2} , i.e., the signed Laplacian (Kunegis et al., 2010 ) . It means that the graph is composed of both positive and negative edges.

Chung’s directed Laplacian (Chung, 2005 ) is defined as 𝐋 + = 𝚽 1 2 ( 𝚽 𝐏 out + 𝐏 out 𝚽 ) subscript 𝐋 𝚽 1 2 𝚽 subscript 𝐏 out subscript superscript 𝐏 out 𝚽 \mathbf{L}_{+}=\mathbf{\Phi}-\frac{1}{2}\left(\mathbf{\Phi}\mathbf{P}_{\mathrm{out}}+\mathbf{P}^{*}_{\mathrm{out}}\mathbf{\Phi}\right) , where 𝐏 out subscript 𝐏 out \mathbf{P}_{\mathrm{out}} is the transition probability matrix of out-degree, 𝚽 𝚽 \mathbf{\Phi} is the diagonal matrix with entries 𝚽 ( v , v ) = ϕ ( v ) 𝚽 𝑣 𝑣 italic-ϕ 𝑣 \mathbf{\Phi}(v,v)=\mathbf{\phi}(v) , and ϕ italic-ϕ \mathbf{\phi} is the Perron vector of 𝐏 out subscript 𝐏 out \mathbf{P}_{\mathrm{out}} . For an undirected graph, 𝚽 = 𝐃 𝚽 𝐃 \mathbf{\Phi}=\mathbf{D} , thus 𝐋 + subscript 𝐋 \mathbf{L}_{+} will be degenerated to 𝐋 𝐋 \mathbf{L} . The normalized Chung’s directed Laplacian is defined as + = 𝐈 n 1 2 ( 𝚽 1 2 𝐏 out 𝚽 1 2 + 𝚽 1 2 𝐏 out 𝚽 1 2 ) subscript subscript 𝐈 𝑛 1 2 superscript 𝚽 1 2 subscript 𝐏 out superscript 𝚽 1 2 superscript 𝚽 1 2 subscript superscript 𝐏 out superscript 𝚽 1 2 \mathcal{L}_{+}=\mathbf{I}_{n}-\frac{1}{2}\left(\mathbf{\Phi}^{\frac{1}{2}}\mathbf{P}_{\mathrm{out}}\mathbf{\Phi}^{-\frac{1}{2}}+\mathbf{\Phi}^{-\frac{1}{2}}\mathbf{P}^{*}_{\mathrm{out}}\mathbf{\Phi}^{\frac{1}{2}}\right) .

Compared with Chung’s directed Laplacian, the magnetic Laplacian has several superiorities. First, the normalized magnetic adjacency matrix is likely to be a sparse matrix unless the graph itself is fully connected. It is because that the normalized magnetic adjacency matrix is the element-wise production of the normalized adjacency matrix and the parallel transpose matrix, where the normalized adjacency matrix is calculated from the unnormalized adjacency matrix and the degree matrix. The normalized adjacency matrix is likely to be sparse because the degree matrix is always sparse and the unnormalized adjacency matrix is also sparse unless the graph is fully connected. As a consequence, the normalized magnetic adjacency matrix is likely to be a sparse matrix no matter if the parallel transpose matrix is sparse or not.

Second, to apply Chung’s directed Laplacian, there is an assumption that the graph is a strongly connected graph, which means a graph cannot contain isolated nodes or be a bipartite graph. Because the calculation process of Chung’s directed Laplacian requires a Perron vector of the transition probability matrix, in which every element has to be positive. Otherwise, the transition probability matrix will not be irreducible and aperiodic, and it may not have a unique stationary distribution. Although in (Tong et al., 2020 ) two approaches are proposed to ensure the calculation of Chung’s directed Laplacian. Inevitably, the normalized Chung’s directed adjacency matrix is dense due to their calculation method.

𝐇 = k = 0 θ k 𝐏 k , 𝐇 subscript superscript 𝑘 0 subscript 𝜃 𝑘 superscript 𝐏 𝑘 \centering\begin{split}\mathbf{H}={\sum}^{\infty}_{k=0}{\theta}_{k}\mathbf{P}^{k},\end{split}\@add@centering

where θ k subscript 𝜃 𝑘 {\theta}_{k} is a damping factor with k = 0 θ k = 1 subscript superscript 𝑘 0 subscript 𝜃 𝑘 1 {\sum}^{\infty}_{k=0}{\theta}_{k}=1 , and 𝐏 𝐏 \mathbf{P} is the transition probability matrix as a GSO. This diffusion filter belongs to a spectral ranking algorithm called Generalized PageRank (GPR) (Baeza-Yates et al., 2006 ; Gleich, 2015 ; Li et al., 2019 ) . Defining damping factors by a function named damping function as θ k = f ( k ) subscript 𝜃 𝑘 𝑓 𝑘 \theta_{k}=f(k) , we could utilize a variety of series to obtain convergent diffusion filters.

4.3.1 Personalized PageRank, Heat Kernel PageRank, and Markov Diffusion

A well-known example is Personalized PageRank (PPR) (Jeh & Widom, 2003 ) . Given a damping function f ( k ) = ( 1 α ) α k 𝑓 𝑘 1 𝛼 superscript 𝛼 𝑘 f(k)={(1-\alpha)}{\alpha}^{k} , then the diffusion filter of PPR is defined as

𝐇 PPR = k = 0 ( 1 α ) α k 𝐏 k = ( 1 α ) ( 𝐈 n α 𝐏 ) 1 , subscript 𝐇 PPR subscript superscript 𝑘 0 1 𝛼 superscript 𝛼 𝑘 superscript 𝐏 𝑘 1 𝛼 superscript subscript 𝐈 𝑛 𝛼 𝐏 1 \centering\begin{split}\mathbf{H}_{\mathrm{PPR}}&={\sum}^{\infty}_{k=0}{(1-\alpha)}{\alpha}^{k}\mathbf{P}^{k}\\ &={(1-\alpha)}\left(\mathbf{I}_{n}-{\alpha}\mathbf{P}\right)^{-1},\\ \end{split}\@add@centering

where α ( 0 , 1 ) 𝛼 0 1 {\alpha}\in(0,1) is the restart probability, 𝐈 n α 𝐏 subscript 𝐈 𝑛 𝛼 𝐏 \mathbf{I}_{n}-{\alpha}\mathbf{P} is not singular, and 𝐏 1 α 𝐈 n 𝐏 1 𝛼 subscript 𝐈 𝑛 \mathbf{P}\neq\frac{1}{\alpha}\mathbf{I}_{n} . It is an ARMA 0,1 filter.

Another common instance is Heat Kernel PageRank (HKPR) (Chung, 2007 ) . Given a damping function f ( k ) = exp ( t ) t k k ! 𝑓 𝑘 𝑡 superscript 𝑡 𝑘 𝑘 f(k)=\frac{\exp{(-t)}t^{k}}{k!} , then we obtain the filter as

𝐇 HKPR = k = 0 exp ( t ) t k k ! 𝐏 k = exp ( t ( 𝐈 n 𝐏 ) ) , subscript 𝐇 HKPR subscript superscript 𝑘 0 𝑡 superscript 𝑡 𝑘 𝑘 superscript 𝐏 𝑘 𝑡 subscript 𝐈 𝑛 𝐏 \centering\begin{split}\mathbf{H}_{\mathrm{HKPR}}&={\sum}^{\infty}_{k=0}\frac{\exp{(-t)}{t}^{k}}{k!}\mathbf{P}^{k}\\ &=\exp{\left(-t(\mathbf{I}_{n}-\mathbf{P})\right)},\\ \end{split}\@add@centering

where t ( 0 , ) 𝑡 0 t\in(0,\infty) is the diffusion time. Because the range of t 𝑡 t is an open interval, if t 𝑡 t is not controlled in a reasonable closed interval, it is hard to apply HKPR to realistic datasets.

In practice, infinite series based variants have to be truncated. It is general to calculate a filter of infinite series based variant by approximants, such as Taylor series, Chebyshev polynomial (Hammond et al., 2011 ) , Bernstein polynomial (He et al., 2021 ) , and Padé approximant (Perotti & Wojtylak, 2018 ) , as the form of an MA K filter, like k = 0 K γ k 𝐏 k subscript superscript 𝐾 𝑘 0 subscript 𝛾 𝑘 superscript 𝐏 𝑘 \sum^{K}_{k=0}\gamma_{k}\mathbf{P}^{k} with

k = 0 θ k 𝐏 k k = 0 K γ k 𝐏 k p ϵ , subscript delimited-∥∥ subscript superscript 𝑘 0 subscript 𝜃 𝑘 superscript 𝐏 𝑘 subscript superscript 𝐾 𝑘 0 subscript 𝛾 𝑘 superscript 𝐏 𝑘 𝑝 italic-ϵ \centering\begin{split}\left\lVert{{\sum}^{\infty}_{k=0}{\theta}_{k}\mathbf{P}^{k}-\sum^{K}_{k=0}\gamma_{k}\mathbf{P}^{k}}\right\rVert_{p}\leq\epsilon,\end{split}\@add@centering

where ϵ italic-ϵ \epsilon is a round-off error. Instead of adopting a diffusion filter with an infinite order, we would like to focus on a kernel of truncated variants of GPR due to approximation-free. A simple truncated diffusion filter is named Markov Diffusion (MD) kernel (Fouss et al., 2006 ) , which is defined as

𝐇 MD = k = 1 K 1 K 𝐏 k = 1 K 𝐏 ( 𝐈 n 𝐏 K ) ( 𝐈 n 𝐏 ) 1 , subscript 𝐇 MD subscript superscript 𝐾 𝑘 1 1 𝐾 superscript 𝐏 𝑘 1 𝐾 𝐏 subscript 𝐈 𝑛 superscript 𝐏 𝐾 superscript subscript 𝐈 𝑛 𝐏 1 \centering\begin{split}\mathbf{H}_{\mathrm{MD}}&=\sum^{K}_{k=1}{\frac{1}{K}\mathbf{P}^{k}}\\ &=\frac{1}{K}\mathbf{P}\left(\mathbf{I}_{n}-\mathbf{P}^{K}\right)\left(\mathbf{I}_{n}-\mathbf{P}\right)^{-1},\end{split}\@add@centering

where 𝐈 n 𝐏 subscript 𝐈 𝑛 𝐏 \mathbf{I}_{n}-\mathbf{P} is not singular, and 𝐏 𝐈 n 𝐏 subscript 𝐈 𝑛 \mathbf{P}\neq\mathbf{I}_{n} . It is an ARMA K+1,1 filter.

Apparently, Markov Diffusion kernel is not appropriate due to its fixed damping function. There is no decay between each vertex’s k 𝑘 k -hop neighbors and ( k + 1 ) 𝑘 1 (k+1) -hop neighbors. To improve the Markov Diffusion kernel, we introduce LinearRank (Baeza-Yates et al., 2006 ) . The filter of LinearRank (LR) is defined as

𝐇 LR = k = 0 K 1 2 ( K k ) K ( K + 1 ) 𝐏 k = 2 K ( K + 1 ) ( K 𝐈 n ( K + 1 ) 𝐏 + 𝐏 K + 1 ) ( 𝐈 n 𝐏 ) 2 , subscript 𝐇 LR subscript superscript 𝐾 1 𝑘 0 2 𝐾 𝑘 𝐾 𝐾 1 superscript 𝐏 𝑘 2 𝐾 𝐾 1 𝐾 subscript 𝐈 𝑛 𝐾 1 𝐏 superscript 𝐏 𝐾 1 superscript subscript 𝐈 𝑛 𝐏 2 \centering\begin{split}\mathbf{H}_{\mathrm{LR}}&=\sum^{K-1}_{k=0}{\frac{2(K-k)}{K(K+1)}\mathbf{P}^{k}}\\ &=\frac{2}{K(K+1)}\left(K\mathbf{I}_{n}-(K+1)\mathbf{P}+\mathbf{P}^{K+1}\right)(\mathbf{I}_{n}-\mathbf{P})^{-2},\end{split}\@add@centering

where 𝐈 n 𝐏 subscript 𝐈 𝑛 𝐏 \mathbf{I}_{n}-\mathbf{P} is not singular and 𝐏 𝐈 n 𝐏 subscript 𝐈 𝑛 \mathbf{P}\neq\mathbf{I}_{n} . It is an ARMA K+1,2 filter. For homogeneous graphs, we adopt the symmetric renormalized adjacency matrix as the GSO: 𝐏 = 𝒜 ~ q = 𝐃 ~ s 1 2 𝐀 ~ s 𝐃 ~ s 1 2 𝐓 q 𝐏 subscript ~ 𝒜 𝑞 direct-product subscript superscript ~ 𝐃 1 2 𝑠 subscript ~ 𝐀 𝑠 subscript superscript ~ 𝐃 1 2 𝑠 subscript 𝐓 𝑞 \mathbf{P}=\widetilde{\mathcal{A}}_{q}=\widetilde{\mathbf{D}}^{-\frac{1}{2}}_{s}\widetilde{\mathbf{A}}_{s}\widetilde{\mathbf{D}}^{-\frac{1}{2}}_{s}\odot{\mathbf{T}_{q}} , where 𝐀 ~ s = 𝐀 s + 𝐈 n subscript ~ 𝐀 𝑠 subscript 𝐀 𝑠 subscript 𝐈 𝑛 \widetilde{\mathbf{A}}_{s}=\mathbf{A}_{s}+\mathbf{I}_{n} , 𝐃 ~ s = v 𝐀 ~ s ( u , v ) subscript ~ 𝐃 𝑠 subscript 𝑣 subscript ~ 𝐀 𝑠 𝑢 𝑣 \widetilde{\mathbf{D}}_{s}=\sum_{v}\widetilde{\mathbf{A}}_{s}{(u,v)} . For heterogeneous graphs, we adopt the negative symmetric renormalized magnetic adjacency matrix as the GSO: 𝐏 = 𝒜 ~ q 𝐏 subscript ~ 𝒜 𝑞 \mathbf{P}=-\widetilde{\mathcal{A}}_{q} .

min 𝐱 { μ 𝐱 ¯ 𝐱 2 2 + 𝒮 p ( 𝐱 ) } , subscript min 𝐱 𝜇 subscript superscript delimited-∥∥ ¯ 𝐱 𝐱 2 2 subscript 𝒮 𝑝 𝐱 \centering\begin{split}\operatorname*{min}_{\mathbf{x}}\{\mu\left\lVert{\mathbf{\bar{x}}-\mathbf{x}}\right\rVert^{2}_{2}+\mathcal{S}_{p}(\mathbf{x})\},\end{split}\@add@centering

where μ > 0 𝜇 0 \mu>0 is a trade-off coefficient and 𝒮 p ( 𝐱 ) subscript 𝒮 𝑝 𝐱 \mathcal{S}_{p}(\mathbf{x}) is the Dirichlet energy (Shuman et al., 2013 ; Cai & Wang, 2020 ) over an undirected graph as

𝒮 p ( 𝐱 ) = 1 p u 𝒱 u 𝐱 2 p = 1 p u 𝒱 [ v 𝒩 u 𝐀 s ( u , v ) [ x ( u ) x ( v ) ] 2 ] p 2 = 1 p ( u , v ) 𝐀 s p 2 ( u , v ) [ x ( u ) x ( v ) ] p . subscript 𝒮 𝑝 𝐱 1 𝑝 subscript 𝑢 𝒱 subscript superscript delimited-∥∥ subscript 𝑢 𝐱 𝑝 2 1 𝑝 subscript 𝑢 𝒱 superscript delimited-[] subscript 𝑣 subscript 𝒩 𝑢 subscript 𝐀 𝑠 𝑢 𝑣 superscript delimited-[] 𝑥 𝑢 𝑥 𝑣 2 𝑝 2 1 𝑝 subscript 𝑢 𝑣 superscript subscript 𝐀 𝑠 𝑝 2 𝑢 𝑣 superscript delimited-[] 𝑥 𝑢 𝑥 𝑣 𝑝 \centering\begin{split}\mathcal{S}_{p}(\mathbf{x})&=\frac{1}{p}\sum_{u\in\mathcal{V}}\left\lVert{\nabla_{u}\mathbf{x}}\right\rVert^{p}_{2}\\ &=\frac{1}{p}\sum_{u\in\mathcal{V}}\left[\sum_{v\in\mathcal{N}_{u}}{\mathbf{A}_{s}(u,v)}[x(u)-x(v)]^{2}\right]^{\frac{p}{2}}\\ &=\frac{1}{p}\sum_{(u,v)\in\mathcal{E}}{\mathbf{A}_{s}^{\frac{p}{2}}(u,v)}[x(u)-x(v)]^{p}.\end{split}\@add@centering

When p = 1 𝑝 1 p=1 , 𝒮 1 ( 𝐱 ) = 𝐀 s ( u , v ) 2 [ x ( u ) x ( v ) ] = [ 𝐱 ] ( u , v ) subscript 𝒮 1 𝐱 2 subscript 𝐀 𝑠 𝑢 𝑣 delimited-[] 𝑥 𝑢 𝑥 𝑣 delimited-[] 𝐱 𝑢 𝑣 \mathcal{S}_{1}(\mathbf{x})=\sqrt[2]{\mathbf{A}_{s}(u,v)}[x(u)-x(v)]=[\nabla\mathbf{x}](u,v) is the graph gradient. When p = 2 𝑝 2 p=2 , 𝒮 2 ( 𝐱 ) = 1 2 ( u , v ) 𝐀 s ( u , v ) [ x ( u ) x ( v ) ] 2 = 𝐱 𝐱 subscript 𝒮 2 𝐱 1 2 subscript 𝑢 𝑣 subscript 𝐀 𝑠 𝑢 𝑣 superscript delimited-[] 𝑥 𝑢 𝑥 𝑣 2 superscript 𝐱 𝐱 \mathcal{S}_{2}(\mathbf{x})=\frac{1}{2}\sum_{(u,v)\in\mathcal{E}}\mathbf{A}_{s}(u,v)[x(u)-x(v)]^{2}=\mathbf{x}^{*}\mathcal{L}\mathbf{x} is the graph Laplacian quadratic form. Besides, the p 𝑝 p -Laplacian (Lindqvist, 2019 ) can be derived from 𝒮 p ( 𝐱 ) subscript 𝒮 𝑝 𝐱 \mathcal{S}_{p}(\mathbf{x}) .

Since in graph signal processing, the normalized Laplacian is frequently adopted as a GSO and the normalized Laplacian is a special case of the normalized magnetic Laplacian, we choose to discuss the condition when p = 2 𝑝 2 p=2 . When p = 2 𝑝 2 p=2 , 𝒮 2 ( 𝐱 ) = 𝐱 𝐱 subscript 𝒮 2 𝐱 superscript 𝐱 𝐱 \mathcal{S}_{2}(\mathbf{x})=\mathbf{x}^{*}\mathcal{L}\mathbf{x} , we can replace the Laplacian regularization with the magnetic Laplacian regularization. Then Equation 9 can be rewritten as

min 𝐱 { μ 𝐱 ¯ 𝐱 2 2 + 𝐱 q 𝐱 } . subscript min 𝐱 𝜇 subscript superscript delimited-∥∥ ¯ 𝐱 𝐱 2 2 superscript 𝐱 subscript 𝑞 𝐱 \centering\begin{split}\operatorname*{min}_{\mathbf{x}}\{\mu\left\lVert{\mathbf{\bar{x}}-\mathbf{x}}\right\rVert^{2}_{2}+\mathbf{x}^{*}\mathcal{L}_{q}\mathbf{x}\}.\end{split}\@add@centering

From Equation 11 , We can derive two equivalent closed-form solutions: (1) Personalized PageRank filter (Jeh & Widom, 2003 ) : 𝐱 ¯ = β ( 𝐈 n α 𝒜 q ) 1 𝐱 ¯ 𝐱 𝛽 superscript subscript 𝐈 𝑛 𝛼 subscript 𝒜 𝑞 1 𝐱 \mathbf{\bar{x}}=\beta{(\mathbf{I}_{n}-\alpha\mathcal{A}_{q})^{-1}}\mathbf{x} , where 𝒜 q subscript 𝒜 𝑞 \mathcal{A}_{q} is the normalized magnetic adjacency matrix, α = 1 μ + 1 ( 0 , 1 ) 𝛼 1 𝜇 1 0 1 \alpha=\frac{1}{\mu+1}\in(0,1) and β = μ μ + 1 ( 0 , 1 ) 𝛽 𝜇 𝜇 1 0 1 \beta=\frac{\mu}{\mu+1}\in(0,1) . (2) von Neumann kernel (Kandola et al., 2003 ) : 𝐱 ¯ = ( 𝐈 n + 1 μ q ) 1 𝐱 ¯ 𝐱 superscript subscript 𝐈 𝑛 1 𝜇 subscript 𝑞 1 𝐱 \mathbf{\bar{x}}=(\mathbf{I}_{n}+\frac{1}{\mu}\mathcal{L}_{q})^{-1}\mathbf{x} . The results indicate that an AR K filter or the feedback term of an ARMA F,B filter as ( 𝐈 n α 𝒜 q ) K superscript subscript 𝐈 𝑛 𝛼 subscript 𝒜 𝑞 𝐾 (\mathbf{I}_{n}-\alpha\mathcal{A}_{q})^{-K} or ( 𝐈 n + 1 μ q ) K superscript subscript 𝐈 𝑛 1 𝜇 subscript 𝑞 𝐾 (\mathbf{I}_{n}+\frac{1}{\mu}\mathcal{L}_{q})^{-K} has a denoising ability.

Table 1: Statistics of datasets

Based on the graph Fourier transform, a graph convolution layer is defined as 𝐙 = σ ( 𝐇𝐗𝐖 ) 𝐙 𝜎 𝐇𝐗𝐖 \mathbf{Z}=\sigma{\left(\mathbf{HXW}\right)} , where 𝐇 n × n 𝐇 superscript 𝑛 𝑛 \mathbf{H}\in\mathbb{C}^{n\times n} is a graph filter matrix, 𝐗 n × c in 𝐗 superscript 𝑛 subscript 𝑐 in \mathbf{X}\in\mathbb{C}^{n\times c_{\mathrm{in}}} is a feature matrix, 𝐖 c in × c out 𝐖 superscript subscript 𝑐 in subscript 𝑐 out \mathbf{W}\in\mathbb{C}^{c_{\mathrm{in}}\times c_{\mathrm{out}}} is a learnable weight matrix, and σ ( ) 𝜎 \sigma{(\cdot)} is an activation function.

There are two approaches to simplify a graph convolution layer into a linear layer form, which we call the graph augmented linear layer. The first one, which we name pre-computation style, is defined as 𝐙 = σ ( 𝐗 ¯ 𝐖 ) 𝐙 𝜎 ¯ 𝐗 𝐖 \mathbf{Z}=\sigma{\left(\bar{\mathbf{X}}\mathbf{W}\right)} , where 𝐗 ¯ = 𝐇𝐗 ¯ 𝐗 𝐇𝐗 \bar{\mathbf{X}}=\mathbf{HX} is a n × c in 𝑛 subscript 𝑐 in n\times c_{\mathrm{in}} matrix multiplied by graph filter with feature matrix before the model training. The other one, which we call pre-prediction style, is defined as 𝐙 = σ ( 𝐇 𝐖 ¯ ) 𝐙 𝜎 𝐇 ¯ 𝐖 \mathbf{Z}=\sigma{\left(\mathbf{H}\bar{\mathbf{W}}\right)} , where 𝐖 ¯ i , : = f θ ( 𝐗 i , : ) subscript ¯ 𝐖 𝑖 : subscript 𝑓 𝜃 subscript 𝐗 𝑖 : \bar{\mathbf{W}}_{i,:}=f_{\theta}(\mathbf{X}_{i,:}) is a n × c out 𝑛 subscript 𝑐 out n\times c_{\mathrm{out}} matrix multiplied by feature matrix with a learnable weight matrix. The pre-computation style graph convolution is first proposed in SGC (Wu et al., 2019 ) , and the pre-prediction style graph convolution is first proposed in PPNP and APPNP (Klicpera et al., 2019a ) .

Although those two simplified approaches of a graph convolution layer are equivalent, the latter requires backward propagation, which leads to high computational and storage cost in backward propagation stage. In contrast, the computational and storage cost of the pre-computation style graph convolution is eliminated because the graph convolution for this approach does not require backward propagation. Due to the above reasons, we decide to adopt the pre-computation style graph convolution in our models.

We propose an iteration approach to calculate 𝐗 ¯ = 𝐇 LR 𝐗 ¯ 𝐗 subscript 𝐇 LR 𝐗 \mathbf{\bar{X}}=\mathbf{H}_{\mathrm{LR}}\mathbf{X} as

𝐗 ¯ ( 0 ) = 𝐓 ( 0 ) = 2 K + 1 𝐗 , 𝐓 ( k + 1 ) = K k 1 K k 𝐏𝐓 ( k ) , 𝐗 ¯ ( k + 1 ) = 𝐗 ¯ ( k ) + 𝐓 ( k + 1 ) . formulae-sequence superscript ¯ 𝐗 0 superscript 𝐓 0 2 𝐾 1 𝐗 formulae-sequence superscript 𝐓 𝑘 1 𝐾 𝑘 1 𝐾 𝑘 superscript 𝐏𝐓 𝑘 superscript ¯ 𝐗 𝑘 1 superscript ¯ 𝐗 𝑘 superscript 𝐓 𝑘 1 \centering\begin{split}\mathbf{\bar{X}}^{(0)}&=\mathbf{T}^{(0)}=\frac{2}{K+1}\mathbf{X},\\ \mathbf{T}^{(k+1)}&=\frac{K-k-1}{K-k}\mathbf{P}\mathbf{T}^{(k)},\\ \mathbf{\bar{X}}^{(k+1)}&=\mathbf{\bar{X}}^{(k)}+\mathbf{T}^{(k+1)}.\end{split}\@add@centering

When k = K 1 𝑘 𝐾 1 k=K-1 , this iteration is terminated.

𝐘 ^ MGC = Softmax ( ς ( 𝐗 ¯ 𝐖 ( 0 ) ) 𝐖 ( 1 ) ) , subscript ^ 𝐘 MGC Softmax 𝜍 ¯ 𝐗 superscript 𝐖 0 superscript 𝐖 1 \centering\begin{split}\hat{\mathbf{Y}}_{\mathrm{MGC}}=\mathrm{Softmax}{\left(\varsigma\left(\bar{\mathbf{X}}\mathbf{W}^{(0)}\right)\mathbf{W}^{(1)}\right)},\end{split}\@add@centering

where ς ( ) 𝜍 \varsigma(\cdot) is a novel complex-valued activation function named Complex Gated Tanh Unit ( GTU GTU \mathbb{C}\mathrm{GTU} ). It is defined as

GTU ( z ) = Tanh ( ( z ) ) Tanh ( ( z ) ) , GTU 𝑧 direct-product Tanh 𝑧 Tanh 𝑧 \centering\begin{split}\mathbb{C}\mathrm{GTU}(z)&=\mathrm{Tanh}{\left(\mathfrak{R}({z})\right)}\odot\mathrm{Tanh}{\left(\mathfrak{I}({z})\right)},\end{split}\@add@centering

where \mathfrak{R} is the real part of a complex-valued tensor and \mathfrak{I} is the imaginary part. When the input is a complex-valued tensor, GTU GTU \mathbb{C}\mathrm{GTU} will turn it into a real-valued tensor. When the input is a real-valued tensor, GTU GTU \mathbb{C}\mathrm{GTU} will be degenerated to Tanh. As (Luan et al., 2019 ) mentioned, Tanh has a great impact to relieve over-smoothing.

We follow (Trabelsi et al., 2018 ) for complex-valued weight initialization. Notice that all these settings ensure MGC be degenerated to an RVNN when q = 0 𝑞 0 q=0 or q = 1 2 𝑞 1 2 q=\frac{1}{2} .

Datasets. We use two different graph datasets, three citation networks, and four webpage networks, for node classification tasks. The detailed statistics of those datasets are shown in Table 1 . Notice that citation networks are homogeneous graphs and webpage networks are heterogeneous graphs.

Table 3: Node classification accuracy (%). OOM means out of GPU memory.

For citation networks, CoRA, CiteSeer, and PubMed (Sen et al., 2008 ) are standard semi-supervised benchmark datasets. In this paper, we adopt CoRAR and CiteSeerR (Zou et al., 2019 ) instead of CoRA and CiteSeer because the data in both CoRA and CiteSeer are not clean. In CoRA, there exist 32 duplicated papers. In CiteSeer, there exist 161 duplicated papers. Both CoRA and CiteSeer have information leak issue, which means that features include label contexts of papers. More details can be found in Appendix A of (Zou et al., 2019 ) . In these three citation networks, every node represents a paper and every edge represents a citation from one paper to another. The edge direction is defined from a citing paper to a cited paper. The feature corresponds to the bag-of-words representation of the document and belongs to one of the academic topics. For citation networks, in order to evaluate semi-supervised graph representation learning, we randomly split nodes of each class into 5%, 10%, and 85% for training, validation, and test sets, respectively.

For webpage networks, Cornell, Texas, Washington, and Wisconsin (Lu & Getoor, 2003 ) are included. Each node represents a webpage and each edge represents a hyperlink between two webpages. The edge direction is from the original webpage to the referenced webpage. The feature of each node is the bag-of-words representation of the corresponding page. For webpage networks, in order to evaluate supervised graph representation learning, we follow previous work (Pei et al., 2020 ) to randomly split nodes of each class into 60%, 20%, and 20% for training, validation, and test sets, respectively.

Baselines , detailed setup and hyperparameters. To verify the superiority of our model, we introduce ChebyNet (Defferrard et al., 2016 ) , GCN (Kipf & Welling, 2017 ) , SGC (Wu et al., 2019 ) , PPNP, APPNP (Klicpera et al., 2019a ) , GDC(PPR)-GCN, GDC(HKPR)-GCN (Klicpera et al., 2019b ) , S 2 GC (Zhu & Koniusz, 2021 ) , DiGCN (Tong et al., 2020 ) , and MagNet (Zhang et al., 2021 ) as baselines. For all these baselines, we use the default setting and parameters as described in the corresponding paper.

We train our models using an Adam optimizer with a maximum of 10,000 epoch and early stopping with patience to be 50. Table 2 summaries other hyperparameters of MGC on all datasets. All experiments are tested on a Linux server equipped with an Intel i7-6700K 4.00 GHz CPU, 64 GB RAM, and an NVIDIA GeForce GTX 1080 Ti GPU.

Table 3 reports the mean of the node classification accuracy with standard deviation on the test set of each model. In citation networks, MGC is competitive. In PubMed, MGC has the best result among all models. In webpage networks, since the directed graph is cyclic, we have q 0 𝑞 0 q\neq 0 . The results in webpage networks show that MGC has a remarkable performance in directed heterogeneous graphs than DiGCN and MagNet.

In order to study which truncated filter is more over-smoothing resistant, we design an ablation experiment. We adopt the S 2 GC as the backbone. The results are in Table 4 . The experimental results indicate that both Markov Diffusion kernel and LinearRank filter have insignificant performance difference, and the latter is more over-smoothing resistant.

Table 4: Node classification accuracy (%) w.r.t. K 𝐾 K order
Dataset Filter K = 2 𝐾 2 K=2 K = 4 𝐾 4 K=4 K = 8 𝐾 8 K=8 K = 16 𝐾 16 K=16 K = 32 𝐾 32 K=32 K = 64 𝐾 64 K=64 K = 128 𝐾 128 K=128 K = 256 𝐾 256 K=256

Note that datasets in webpage networks can be represented as directed cyclic graphs, thus q 0 𝑞 0 q\neq 0 . In order to study how the electric changer parameter affects the performance of MGC, we compare q = 0 𝑞 0 q=0 and q 0 𝑞 0 q\neq 0 . Recall when q = 0 𝑞 0 q=0 , MGC will be degenerated from a CVNN to a RVNN; thus the ablation experiment results also show the comparison between a CVNN and a RVNN. As shown in Table 5 , when q 0 𝑞 0 q\neq 0 , which means the model is a CVNN, MGC has better experiment results. It verifies the effectiveness of the magnetic Laplacian on directed cyclic graphs. Figure 2 demonstrates how different values of q 𝑞 q affect the frequency response of a GSO. The frequency response of the renormalized magnetic adjacency matrix is described in Appendix A .

Table 5: Node classification accuracy (%) w.r.t. q 𝑞 q
Dataset q = 0 𝑞 0 q=0 q 0 𝑞 0 q\neq{0}

There exist two definitions of the parallel transpose matrix, differing by a minus sign. In (Fanuel et al., 2017 ) , 𝐓 q ( u , v ) = exp ( i 2 π q ( 𝐀 ( v , u ) 𝐀 ( u , v ) ) ) subscript 𝐓 𝑞 𝑢 𝑣 𝑖 2 𝜋 𝑞 𝐀 𝑣 𝑢 𝐀 𝑢 𝑣 \mathbf{T}_{q}(u,v)=\exp\left(i2{\pi}q\left(\mathbf{A}(v,u)-\mathbf{A}(u,v)\right)\right) . In (Fanuel et al., 2018 ) , 𝐓 q ( u , v ) = exp ( i 2 π q ( 𝐀 ( u , v ) 𝐀 ( v , u ) ) ) subscript 𝐓 𝑞 𝑢 𝑣 𝑖 2 𝜋 𝑞 𝐀 𝑢 𝑣 𝐀 𝑣 𝑢 \mathbf{T}_{q}(u,v)=\exp\left(i2{\pi}q\left(\mathbf{A}(u,v)-\mathbf{A}(v,u)\right)\right) . In this paper, we choose the definition of the latter. The normalized magnetic Laplacian which adopts the former parallel transpose matrix is the conjugate transpose of the normalized magnetic Laplacian which adopts the latter. Thus, both magnetic Laplacians share the same eigenvalues. Therefore, if a GNN utilizes the magnetic Laplacian, adopting either definition has no influence on the performance of the GNN.

We notice that some of state-of-the-art GNNs are mini-batch based models. This approach speeds up the training process and allows GNNs to be applied for super-secrecy data. However, determining the value of the electric charger parameter q 𝑞 q in each batch is a challenging issue because the value of q 𝑞 q depends on the existence of the directed m 𝑚 m -cycle. If the mini-batch method is based on graph partition, how to partition graphs will directly influence the directed m 𝑚 m -cycle. If a graph is partitioned inappropriately, each sub-graph may become a directed acyclic graph. In this situation, the value of q 𝑞 q for each batch is 0 0 , which will cause the model to lose the advantage of the magnetic Laplacian. In addition, if the majority of sub-graphs are directed cyclic graphs, and others are directed acyclic graphs, how to decide the value of q 𝑞 q for each batch is also an issue.

There are plenty of deformed Laplacians which have been researched in graph theory and graph signal processing, such as Chung’s directed Laplacian (Chung, 2005 ) , the p 𝑝 p -Laplacian (Lindqvist, 2019 ) , the signed Laplacian (Kunegis et al., 2010 ) , and the magnetic Laplacian (Fanuel et al., 2017 , 2018 ) . We utilize the property of the magnetic Laplacian for directed graphs and combine it with LinearRank (Baeza-Yates et al., 2006 ) to propose MGC. We test our model in citation networks (Sen et al., 2008 ; Zou et al., 2019 ) and webpage networks (Lu & Getoor, 2003 ) . The experimental results demonstrate that LinearRank is widely applicable for realistic datasets and has a great over-smoothing resistance. MGC is not only well-suited for homogeneous graphs like existing works such as GCN (Kipf & Welling, 2017 ) , SGC (Wu et al., 2019 ) , PPNP, APPNP (Klicpera et al., 2019a ) , GDC (Klicpera et al., 2019b ) , and S 2 GC (Zhu & Koniusz, 2021 ) , but also effective for heterogeneous graphs. Out of curiosity, we will tap potentials of other spectral ranking algorithms and deformed Laplacians for exploiting the future of graph representation learning.

Acknowledgements

Jie Zhang would like to thank Bruno Messias F. Resende who offers the code of the visualization of magnetic eigenmaps, and Hao-Sheng Chen who helps drawing the magnetic eigenmaps.

References

Baeza-Yates et al. (2006) Baeza-Yates, R., Boldi, P., and Castillo, C. Generalizing pagerank: Damping functions for link-based ranking algorithms. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , SIGIR ’06, pp.  308–315. Association for Computing Machinery, 2006. ISBN 1595933697. In Precup, D. and Teh, Y. W. (eds.), Proceedings of the 34th International Conference on Machine Learning , volume 70 of Proceedings of Machine Learning Research , pp.  933–941. PMLR, 06–11 Aug 2017. In Lee, D. D., Sugiyama, M., von Luxburg, U., Guyon, I., and Garnett, R. (eds.), Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain , pp.  3837–3845, 2016. In Proceedings of the 12th International Conference on World Wide Web , WWW ’03, pp.  271–279, New York, NY, USA, 2003. Association for Computing Machinery. In Becker, S., Thrun, S., and Obermayer, K. (eds.), Advances in Neural Information Processing Systems , volume 15, pp.  657–664. MIT Press, 2003. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings , 2017. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems , volume 32, pp. 13333–13345. Curran Associates, Inc., 2019b. Kunegis et al. (2010) Kunegis, J., Schmidt, S., Lommatzsch, A., Lerner, J., Luca, E. W. D., and Albayrak, S. Spectral Analysis of Signed Graphs for Clustering, Prediction and Visualization , pp.  559–570. SIAM, 2010. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems , volume 32. Curran Associates, Inc., 2019. In Wallach, H., Larochelle, H., Beygelzimer, A., d'Alché-Buc, F., Fox, E., and Garnett, R. (eds.), Advances in Neural Information Processing Systems , volume 32, pp.  10943–10953. Curran Associates, Inc., 2019. In Michelangeli, A. and Dell’Antonio, G. (eds.), Advances in Quantum Mechanics: Contemporary Trends and Open Problems , pp.  257–266. Springer International Publishing, 2017. ISBN 978-3-319-58904-6. Matrix methods for padé approximation: Numerical calculation of poles, zeros and residues. Linear Algebra and its Applications , 548:95–122, 2018. ISSN 0024-3795. Sen et al. (2008) Sen, P., Namata, G., Bilgic, M., Getoor, L., Galligher, B., and Eliassi-Rad, T. Collective classification in network data. AI Magazine , 29(3):93, Sep. 2008. Shuman et al. (2013) Shuman, D. I., Narang, S. K., Frossard, P., Ortega, A., and Vandergheynst, P. The emerging field of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Processing Magazine , 30(3):83–98, 2013. In Stanković, L. and Sejdić, E. (eds.), Vertex-Frequency Analysis of Graph Signals , pp.  3–108. Springer International Publishing, 2019. ISBN 978-3-030-03574-7. In Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M. F., and Lin, H. (eds.), Advances in Neural Information Processing Systems , volume 33, pp.  17907–17918. Curran Associates, Inc., 2020. Trabelsi et al. (2018) Trabelsi, C., Bilaniuk, O., Zhang, Y., Serdyuk, D., Subramanian, S., Santos, J. F., Mehri, S., Rostamzadeh, N., Bengio, Y., and Pal, C. J. Deep complex networks. In International Conference on Learning Representations , 2018. In Djurić, P. M. and Richard, C. (eds.), Cooperative and Graph Signal Processing , pp.  299–324. Academic Press, 2018. ISBN 978-0-12-813677-5. In Chaudhuri, K. and Salakhutdinov, R. (eds.), Proceedings of the 36th International Conference on Machine Learning , volume 97 of Proceedings of Machine Learning Research , pp.  6861–6871. PMLR, 09–15 Jun 2019. Appendix A The frequency response of the normalized magnetic adjacency matrix and the renormalized magnetic adjacency matrix

Since the symmetric normalized Laplacian and the random walk normalized Laplacian share same eigenvalues (von Luxburg, 2007 ) , we can derive that symmetric normalized magnetic Laplacian and random walk normalized magnetic Laplacian share same eigenvalues. We denote the set of eigenvalues of normalized Laplacian in ascending order as λ 𝜆 \lambda and the set of eigenvalues of normalized magnetic Laplacian in ascending order as λ q subscript 𝜆 𝑞 \lambda_{q} . Both λ 𝜆 \lambda and λ q subscript 𝜆 𝑞 \lambda_{q} are named frequency response in graph signal processing. Therefore, the frequency response of the normalized magnetic adjacency matrix can be written as 1 λ q 1 subscript 𝜆 𝑞 1-\lambda_{q} . In (Fanuel et al., 2017 ) , it has been proved that 0 = λ ( 0 ) λ q ( 0 ) 0 superscript 𝜆 0 subscript superscript 𝜆 0 𝑞 0=\lambda^{(0)}\leq\lambda^{(0)}_{q} , where λ ( 0 ) superscript 𝜆 0 \lambda^{(0)} and λ q ( 0 ) subscript superscript 𝜆 0 𝑞 \lambda^{(0)}_{q} represent the lowest eigenvalues of the normalized Laplacian and the normalized magnetic Laplacian, respectively. Note that when q = 0 𝑞 0 q=0 , the equality is achieved. We assume that

𝐃 s d ¯ 𝐈 n , subscript 𝐃 𝑠 ¯ 𝑑 subscript 𝐈 𝑛 \centering\begin{split}\mathbf{D}_{s}\approx\bar{d}\mathbf{I}_{n},\end{split}\@add@centering

where d ¯ ¯ 𝑑 \bar{d} represents average node degree. It can yield that

𝐀 s d ¯ ( 𝐈 n ) . subscript 𝐀 𝑠 ¯ 𝑑 subscript 𝐈 𝑛 \centering\begin{split}\mathbf{A}_{s}\approx\bar{d}\left(\mathbf{I}_{n}-\mathcal{L}\right).\end{split}\@add@centering

Then, we put Equations 15 and 16 into the renormalized adjacency matrix as

𝒜 ~ = 𝐃 ~ s 1 2 𝐀 ~ s 𝐃 ~ s 1 2 = ( 𝐃 s + 𝐈 n ) 1 2 ( 𝐀 s + 𝐈 n ) ( 𝐃 s + 𝐈 n ) 1 2 ( d ¯ 𝐈 n + 𝐈 n ) 1 2 ( d ¯ ( 𝐈 n ) + 𝐈 n ) ( d ¯ 𝐈 n + 𝐈 n ) 1 2 = 𝐈 n d ¯ d ¯ + 1 ~ 𝒜 subscript superscript ~ 𝐃 1 2 𝑠 subscript ~ 𝐀 𝑠 subscript superscript ~ 𝐃 1 2 𝑠 superscript subscript 𝐃 𝑠 subscript 𝐈 𝑛 1 2 subscript 𝐀 𝑠 subscript 𝐈 𝑛 superscript subscript 𝐃 𝑠 subscript 𝐈 𝑛 1 2 superscript ¯ 𝑑 subscript 𝐈 𝑛 subscript 𝐈 𝑛 1 2 ¯ 𝑑 subscript 𝐈 𝑛 subscript 𝐈 𝑛 superscript ¯ 𝑑 subscript 𝐈 𝑛 subscript 𝐈 𝑛 1 2 subscript 𝐈 𝑛 ¯ 𝑑 ¯ 𝑑 1 \centering\begin{split}\widetilde{\mathcal{A}}&=\widetilde{\mathbf{D}}^{-\frac{1}{2}}_{s}\widetilde{\mathbf{A}}_{s}\widetilde{\mathbf{D}}^{-\frac{1}{2}}_{s}\\ &=\left(\mathbf{D}_{s}+\mathbf{I}_{n}\right)^{-\frac{1}{2}}\left(\mathbf{A}_{s}+\mathbf{I}_{n}\right)\left(\mathbf{D}_{s}+\mathbf{I}_{n}\right)^{-\frac{1}{2}}\\ &\approx\left(\bar{d}\mathbf{I}_{n}+\mathbf{I}_{n}\right)^{-\frac{1}{2}}\left(\bar{d}\left(\mathbf{I}_{n}-\mathcal{L}\right)+\mathbf{I}_{n}\right)\left(\bar{d}\mathbf{I}_{n}+\mathbf{I}_{n}\right)^{-\frac{1}{2}}\\ &=\mathbf{I}_{n}-\frac{\bar{d}}{\bar{d}+1}\mathcal{L}\end{split}\@add@centering

Adopting the same method, we can derive 𝒜 ~ q 𝐈 n d ¯ d ¯ + 1 q subscript ~ 𝒜 𝑞 subscript 𝐈 𝑛 ¯ 𝑑 ¯ 𝑑 1 subscript 𝑞 \widetilde{\mathcal{A}}_{q}\approx\mathbf{I}_{n}-\frac{\bar{d}}{\bar{d}+1}\mathcal{L}_{q} . Hence, the frequency response of the renormalized adjacency matrix is approximated as 1 d ¯ d ¯ + 1 λ 1 ¯ 𝑑 ¯ 𝑑 1 𝜆 1-\frac{\bar{d}}{\bar{d}+1}\lambda , the frequency response of the renormalized magnetic adjacency matrix is approximated as 1 d ¯ d ¯ + 1 λ q 1 ¯ 𝑑 ¯ 𝑑 1 subscript 𝜆 𝑞 1-\frac{\bar{d}}{\bar{d}+1}\lambda_{q} . Likewise, the frequency response of the negative renormalized magnetic adjacency matrix is approximated as d ¯ d ¯ + 1 λ q 1 ¯ 𝑑 ¯ 𝑑 1 subscript 𝜆 𝑞 1 \frac{\bar{d}}{\bar{d}+1}\lambda_{q}-1 .