Tutorial · June 03, 2026 · 5 min read

Neighbor Sampling in TGraphX: Scaling GNNs to Large Graphs

Target keyword: neighbor sampling GNN large graph pytorch

Neighbor Sampling in TGraphX: Scaling GNNs to Large Graphs

Full-batch GNN training is straightforward — feed the entire graph through the model in one forward pass. It works until the graph stops fitting in memory. For graphs with millions of nodes, you need mini-batch training, and mini-batch training on graphs means sampling.

TGraphX includes three production-scale samplers: NeighborLoader, GraphSAINTLoader, and ClusterLoader. This article walks through each, when to use them, and what trade-offs they make.

Why graph sampling is non-trivial

For images, mini-batching is trivial: pick N images at random. For graphs, picking N nodes at random is not enough — a node without its neighbors cannot be processed by a GNN layer that aggregates from neighbors.

The three standard approaches:

Neighbor sampling. Pick seed nodes, sample their neighborhoods recursively. Each mini-batch is a small subgraph induced by the seeds and their sampled neighbors.
Subgraph sampling. Pick a random subset of nodes/edges that form a connected subgraph. Train on the subgraph as if it were a small full graph.
Cluster sampling. Partition the graph into clusters once. Each mini-batch is one or more clusters.

TGraphX has all three.

NeighborLoader — the default for most cases

python

import tgraphx as tgx
        from tgraphx import NeighborLoader
        
        g = tgx.Graph(x=x, edge_index=ei, labels=y)
        tgx.validate_graph(g, strict=True)
        
        loader = NeighborLoader(
            g,
            num_neighbors=[15, 10],   # 2-hop sample: 15 first-hop neighbors per seed, 10 second-hop
            batch_size=128,
            seed=42,
        )
        
        for batch in loader:
            # batch.x:           features of all nodes in this sampled subgraph
            # batch.edge_index:  edges of the subgraph
            # batch.seed_y:      labels of just the seed nodes
            out = model(batch.x, batch.edge_index)
            seed_out = batch.seed_logits(out)
            loss = F.cross_entropy(seed_out, batch.seed_y)

Each mini-batch is a subgraph centered on batch_size=128 seed nodes. For each seed, 15 neighbors are sampled at the first hop and 10 at the second hop. The subgraph is small enough to fit in memory but contains enough context for the GNN to aggregate meaningfully.

batch.seed_logits() extracts predictions for just the seed nodes — the only nodes you actually want to predict for. This is the standard GraphSAGE-style training pattern.

GraphSAINT — sample subgraphs, not neighborhoods

python

from tgraphx import GraphSAINTLoader
        
        loader = GraphSAINTLoader(
            g,
            method="random_walk",      # or "node", "edge"
            batch_size=2000,           # ~2000 nodes per subgraph
            walk_length=2,
            seed=42,
        )
        
        for batch in loader:
            out = model(batch.x, batch.edge_index)
            loss = F.cross_entropy(out, batch.y)

GraphSAINT samples a subgraph and treats it as a complete small graph. The advantage is that aggregation reaches as deep as you want — no hop limit. The cost is that some nodes appear together more often than they should, requiring a bias correction (which the loader handles automatically).

For deep GNNs (4+ layers), GraphSAINT avoids the exponential neighborhood expansion that makes recursive neighbor sampling increasingly expensive.

ClusterLoader — for very large static graphs

python

from tgraphx import ClusterLoader, RandomBalancedPartitioner
        
        partitioner = RandomBalancedPartitioner(num_clusters=64, seed=42)
        loader = ClusterLoader(g, partitioner=partitioner, batch_size_clusters=4, seed=42)
        
        for batch in loader:
            out = model(batch.x, batch.edge_index)
            loss = F.cross_entropy(out, batch.y)

ClusterLoader partitions the graph once into clusters and trains on cluster batches. This is the Cluster-GCN approach. The partitioning step is one-time and amortizes across many epochs.

TGraphX includes three partitioners:

RandomBalancedPartitioner — random, balanced sizes. Fast.
BFSPartitioner — BFS from seed nodes. Preserves local structure.
SpectralPartitioner — spectral clustering. Best quality but limited to ≤ 4096 nodes due to O(N³) cost.

For most large graphs, the random partitioner is the right starting point.

Choosing between the three

Approach	When to use	Limitations
NeighborLoader	Standard supervised node classification, 2-3 hop models	Memory grows with `num_neighbors`
GraphSAINT	Deep GNNs (4+ layers), graph-level tasks	Subgraphs may miss some neighborhoods
ClusterLoader	Very large static graphs, many epochs	Quality depends on partitioning

Start with NeighborLoader. Switch to GraphSAINT if your model is deep or if NeighborLoader memory becomes a problem. Use ClusterLoader for the largest graphs.

A real consideration: dense graph builders are O(N²)

If you are constructing your graph with tgx.knn_graph(x, k=10), be aware this is O(N²). For N larger than ~5000, it emits a warning and may take a long time. Either:

Sample nodes first, then build kNN on the sample.
Use a precomputed edge index.
Build the graph in chunks if you have a sharded data layout.

The framework documents this limitation in docs/graph_builders.md.

Memory tips

For very memory-constrained settings:

Reduce batch_size first. Subgraph size scales linearly with seed count.
Reduce num_neighbors[1] (second-hop). Second-hop sampling is the dominant memory cost.
Use mixed-precision (torch.float16) for node features if tolerable.
Use pin_memory=True and num_workers=4 in the loader for I/O parallelism.

Reproducibility for samplers

All three loaders accept a seed argument. With a fixed seed and otherwise-deterministic training, two runs produce identical mini-batches and identical results.

python

with tgx.reproducible(seed=42, deterministic=True):
            loader = NeighborLoader(g, num_neighbors=[15, 10], batch_size=128, seed=42)
            for batch in loader:
                ...

The seed=42 to NeighborLoader is separate from the reproducibility context's seed — both should be set.

On benchmark numbers

There are no large-scale published comparisons of TGraphX samplers against PyG samplers. The implementations follow the same algorithms (GraphSAGE neighbor sampling, GraphSAINT random-walk subgraphs, Cluster-GCN partitioning) so per-epoch behavior should be similar to PyG. Per-iteration speed and memory may differ.

If sampler speed is critical to your project, profile both frameworks on your actual data before drawing conclusions. Do not assume one is faster.

FAQ

Q: What does num_neighbors=[15, 10] mean exactly?
A: For each of the batch_size seed nodes, sample 15 first-hop neighbors. For each of those 15, sample 10 second-hop neighbors. The total subgraph has up to batch_size * (1 + 15 + 15*10) nodes plus the seed.

Q: Can I use a single-element list like num_neighbors=[20]?
A: Yes. That gives a 1-hop sample — suitable for a 1-layer GNN.

Q: Does GraphSAINT need a normalization step in the loss?
A: The loader handles the necessary bias correction automatically by default. If you want to disable it, pass normalize=False.

Q: How does ClusterLoader handle edges between clusters?
A: When you sample multiple clusters per batch (batch_size_clusters > 1), cross-cluster edges within the batch are included. Edges to nodes in other un-sampled clusters are not.

Q: What about hetero graphs?
A: Hetero graph sampling is Experimental. The standard NeighborLoader does not handle multiple node types. Use the dedicated HeteroNeighborLoader from the experimental subsystem if you need it.