TGraphX Insights Generating Graphs with Tensor-Valued Node Features in TGraphX
← Back to Insights

Generating Graphs with Tensor-Valued Node Features in TGraphX

Target keyword: graph generation pytorch tensor node features

Generating Graphs with Tensor-Valued Node Features in TGraphX

Synthetic graph generation is essential for graph learning research. You need controlled inputs for ablations, baselines, sanity checks, and stress tests. The classic generators — Erdős–Rényi, Barabási–Albert, Watts–Strogatz, Stochastic Block Model — produce graphs with well-understood statistical properties.

TGraphX includes all of these plus an extension that the classical generators don't natively provide: tensor-valued node features. This article walks through both the classical and the tensor-extended generation.

The classical generators

TGraphX provides one-call access to the standard set:

python
import tgraphx as tgx
        
        # Erdős–Rényi
        g = tgx.generate_graph("er", num_nodes=50, edge_prob=0.1, seed=42)
        
        # Barabási–Albert (preferential attachment)
        g = tgx.generate_graph("ba", num_nodes=50, m=2, seed=42)
        
        # Watts–Strogatz (small-world)
        g = tgx.generate_graph("ws", num_nodes=50, k=4, beta=0.1, seed=42)
        
        # Stochastic Block Model
        g = tgx.generate_graph("sbm", num_blocks=3, block_size=20,
                               p_in=0.3, p_out=0.01, seed=42)
        
        # Regular shapes
        g_grid    = tgx.generate_graph("grid", height=10, width=10, seed=42)
        g_cycle   = tgx.generate_graph("cycle", num_nodes=20, seed=42)
        g_path    = tgx.generate_graph("path", num_nodes=20, seed=42)
        g_star    = tgx.generate_graph("star", num_nodes=20, seed=42)
        g_complete = tgx.generate_graph("complete", num_nodes=10, seed=42)
        

Each returns a tgx.Graph object. By default, node features are randomly initialized vectors.

Adding tensor-valued node features

The extension: pass node_shape to generate features of arbitrary rank:

python
# Each node is a [8] vector — equivalent to default
        g = tgx.generate_graph("ba", num_nodes=100, m=2, node_shape=(8,), seed=42)
        # g.node_features.shape = [100, 8]
        
        # Each node is a [4, 4] matrix
        g = tgx.generate_graph("ba", num_nodes=100, m=2, node_shape=(4, 4), seed=42)
        # g.node_features.shape = [100, 4, 4]
        
        # Each node is a [3, 8, 8] image-like tensor
        g = tgx.generate_graph("ba", num_nodes=100, m=2, node_shape=(3, 8, 8), seed=42)
        # g.node_features.shape = [100, 3, 8, 8]
        

For each generator, the structural part (the edges) is determined by the generator's rules. The features are random tensors of the requested shape. This is enough for testing tensor-aware GNN code on synthetic graphs.

Why this matters

Consider testing a new tensor-aware GNN layer. You want to verify:

  • It accepts the right input shapes
  • It produces output shapes you expect
  • It is invariant or equivariant in the ways you intended
  • It does not crash on edge cases (single node, isolated node, dense graph, sparse graph)

Without tensor-valued generators, you have to construct test graphs manually or use a real dataset. The synthetic generators with node_shape let you write small, deterministic, fast unit tests:

python
import tgraphx as tgx
        
        # Tiny test: 10-node BA graph with image-shaped features
        g = tgx.generate_graph("ba", num_nodes=10, m=2, node_shape=(3, 4, 4), seed=42)
        tgx.validate_graph(g, strict=True)
        tgx.assert_tensor_native(g, min_rank=3)
        
        # Verify your layer accepts this
        output = my_tensor_layer(g.node_features, g.edge_index)
        assert output.shape == (10, expected_out_channels, 4, 4)
        

The test runs in milliseconds, uses fixed seed for reproducibility, and exercises the tensor path through your code.

Generation metrics

The framework includes metrics for evaluating generated graphs:

python
from tgraphx.generation import (
            validity_score, uniqueness_score,
            novelty_score, diversity_score,
        )
        
        graphs = [tgx.generate_graph("ba", num_nodes=20, m=2, seed=i) for i in range(50)]
        
        print(f"Validity:   {validity_score(graphs):.3f}")
        print(f"Uniqueness: {uniqueness_score(graphs):.3f}")
        print(f"Diversity:  {diversity_score(graphs):.3f}")
        

These are useful for benchmarking neural generators against classical ones, or for monitoring how diverse a generated batch is.

Neural graph generation (Experimental)

TGraphX also ships neural graph generators:

python
from tgraphx import VGAEGraphGenerator, AutoregressiveEdgeGenerator
        

The VGAEGraphGenerator is a variational autoencoder for link prediction over an existing graph. The AutoregressiveEdgeGenerator generates edges sequentially conditioned on prior edges. Both are labeled Experimental and should be treated as research prototypes, not production generators.

For research on neural graph generation specifically, dedicated libraries (GraphRNN, GraphAF, etc.) have more mature implementations. The TGraphX versions are useful for quick ablations and integration with the rest of the framework.

A real example: stress-testing a sampler

python
import tgraphx as tgx
        from tgraphx import NeighborLoader
        
        # Stress-test NeighborLoader on a range of graph sizes and structures
        for gen in ["er", "ba", "ws"]:
            for n in [100, 1000, 10000]:
                g = tgx.generate_graph(gen, num_nodes=n, seed=42)
                loader = NeighborLoader(g, num_neighbors=[10, 5], batch_size=32)
                for batch in loader:
                    pass  # just iterate
                print(f"{gen:5s} N={n:6d}: OK")
        

This kind of stress test catches regressions in the loader or in graph construction that would be invisible in normal use.

What is NOT a synthetic graph generator's job

  • Generating graphs that match a specific real dataset. Use that dataset directly.
  • Generating realistic node features. Random tensors are not realistic; they are noise. For realistic features, use a learned generator or a real dataset.
  • Replacing benchmark datasets. Synthetic graphs are for testing the framework, not for evaluating model quality.

Reproducibility

All generators accept a seed argument. The same seed produces the same graph:

python
g1 = tgx.generate_graph("ba", num_nodes=100, m=2, seed=42)
        g2 = tgx.generate_graph("ba", num_nodes=100, m=2, seed=42)
        assert torch.equal(g1.edge_index, g2.edge_index)
        

This is essential for unit tests and for sharing synthetic benchmarks across machines.

Honest limitations

  • The neural generation module is Experimental and has not been benchmarked against dedicated graph-generation libraries.
  • For very large synthetic graphs (>1M nodes), the generators have not been performance-tuned.
  • The node_shape extension produces random tensor features; structure-correlated features require manual setup.

FAQ

Q: Can I add specific node features after generation?
A: Yes. g.node_features = my_features works, as long as the shape matches (num_nodes, *).

Q: How do I generate a graph with specific connectivity properties?
A: For specific properties (e.g., scale-free degree distribution), use the appropriate classical generator (BA for scale-free). For exact custom properties, you'll need to write a custom generator or post-process.

Q: Are the generated graphs guaranteed to be connected?
A: No. ER graphs at low edge probability may have isolated nodes. Use tgx.check_graph_invariants(g, requires_connected=True) to verify.

Q: Can I generate graphs with edge features?
A: Yes. Pass edge_shape=(...) along with node_shape. See docs/graph_generation.md for details.

Q: What about heterogeneous graphs?
A: Heterogeneous graph generation is part of the experimental HeteroGraph subsystem. The classical generators produce homogeneous graphs only.