Skip to main content
News Releases

New Technique Improves Accuracy of Graph Neural Networks

a collection of colorful circles of varying sizes are connected by a complex array of lines
Image credit: BoliviaInteligente.

For Immediate Release

Researchers have demonstrated a new training technique that significantly improves the accuracy of graph neural networks (GNNs) – AI systems used in applications from drug discovery to weather forecasting.

GNNs are AI systems designed to perform tasks where the input data is presented in the form of graphs. Graphs, in this context, refers largely to data structures where data points (called nodes) are connected by lines (called edges). The edges indicate some sort of relationship between the nodes. Edges can be used to connect nodes that are similar (called homophily) — but can also connect nodes that are dissimilar (called heterophily).

For example, in a graph of a neural system there would be edges between nodes representing two neurons that enhance each other, but there would also be edges between nodes that suppress each other.

Because graphs can be used to represent everything from social networks to molecular structure, GNNS are able to capture complex relationships better than many other types of AI systems.

However, training GNNs poses some significant challenges.

You train GNNs by feeding graphs into the system so that the GNN can learn to identify the relationships in the graph – how and why the various nodes are connected to each other. One way of doing this is called semi-supervised learning, in which some of the nodes in the training graph are labeled in advance. This makes it easier for the GNN to identify relationships between the nodes. But there is a drawback to this approach: if the GNN was trained on graphs that contained labeled nodes, it can struggle to accurately identify relationships in graphs where no nodes have been labeled. That’s a problem, because none of the nodes will be labeled when you actually want to put a GNN to practical use.

The solution to this problem is to train GNNs on graphs with unlabeled nodes in the first place. This is called self-supervised learning. But this poses its own challenge.

“If none of the nodes are labeled, the GNN can see that there are edges between nodes but has trouble distinguishing between homophilic edges and heterophilic edges,” says Tianfu Wu, senior author of a paper on the work and an associate professor of electrical and computer engineering at North Carolina State University. “And this problem is especially pronounced in heterophilic graphs, meaning graphs that have more heterophilic relationships than homophilic ones. That’s the problem we’re addressing with this work.”

Specifically, the researchers developed something they call the HarmonyGNN framework. The framework is a detailed training process that improves the ability of GNNs to accurately identify relationships in heterophilic graphs without sacrificing accuracy at identifying relationships in predominantly homophilic graphs.

To see how well the HarmonyGNN framework works, the researchers trained a GNN using the framework and tested it against 11 graphs that are widely used to benchmark GNN performance. The HarmonyGNN-trained system matched the state-of-the-art performance for the seven homophilic graphs and established new state-of-the-art accuracy on the four heterophilic graphs. Accuracy improvement ranged from 1.27% to 9.6% across the four graphs.

“This is a significant advance for GNN training,” Wu says. “In addition, the HarmonyGNN framework also improved the computational efficiency of the training.”

The paper, “HarmonyGNNs: Harmonizing Heterophily and Homophily in GNNs Via Self-Supervised Node Encoding,” will be presented at the Fourteenth International Conference on Learning Representations (ICLR2026), being held April 23-27 in Rio de Janeiro, Brazil. First author of the paper is Rui Xue, a Ph.D. student at NC State.

The researchers have released relevant code on GitHub.

This work was done with support from the Army Research Office under grants W911NF1810295 and W911NF2210010, and from the National Science Foundation under grants 1909644, 2024688 and 2013451.

-shipman-

Note to Editors: The study abstract follows.

“HarmonyGNNs: Harmonizing Heterophily and Homophily in GNNs Via Self-Supervised Node Encoding”

Authors: Rui Xue and Tianfu Wu, North Carolina State University

Presented: April 23-27, the Fourteenth International Conference on Learning Representations (ICLR2026), Rio de Janeiro, Brazil

Abstract: Graph Neural Networks (GNNs) have made significant advances in representation learning on various types of graph-structured data. However, GNNs struggle to simultaneously model heterophily and homophily, a challenge that is amplified under self-supervised learning (SSL) where no labels are available to guide the training process. This paper presents HARMONYGNNs , an end-to-end graph SSL framework designed to harmonize heterophily and homophily through two complementary innovative perspectives: (i) Representation Harmonization via Joint Structural Node Encoding. Nodes are embedded into a unified latent space that retains both node specificity and graph structural awareness for harmonizing heterophily and homophily. Node specificity is learned via linear and non-linear node feature projections. Graph structural awareness is learned via a proposed Weighted Graph Convolutional Network (WGCN). A self-attention module enables the model learning-to-adapt to varying levels of patterns. (ii) Objective Harmonization via Predictive Architecture with Node-Difficulty–Aware Masking. A teacher network processes the full graph. A student network receives a partially masked graph. The student is trained end-to-end, while the teacher is an exponential moving average of the student. The proxy task is to train the student to predict the teacher’s embeddings for all nodes (masked and unmasked). To keep the objective informative across the graph, two masking strategies that guide selection toward currently hard nodes while retaining exploration are proposed. Theoretical underpinnings of HARMONYGNNs are also analyzed in detail. Comprehensive evaluations on benchmarks demonstrate that HARMONYGNNs achieves state-of-the-art performance on heterophilic graphs (e.g., +7.1% on Texas, +9.6% on Roman-Empire over the prior art) while matching SOTA on homophilic graphs, and delivering strong computational efficiency.