TGraphX Insights Shape-Aware Validation in TGraphX: Catching Bugs Before They Matter
← Back to Insights

Shape-Aware Validation in TGraphX: Catching Bugs Before They Matter

Target keyword: graph neural network shape validation python

Shape-Aware Validation in TGraphX: Catching Bugs Before They Matter

If you have spent a few weeks doing GNN research in PyTorch, you have probably encountered something like this:

RuntimeError: shape '[64, 128]' is invalid for input of size 12544
        

The shape mismatch happened twelve layers deep inside a stack of message-passing operations. The traceback is unhelpful. The data preprocessing was fine. The model was fine. Something subtle changed — an edge index dropped a node, a feature tensor lost a dimension, a batch was assembled wrong.

This article is about a small but useful design choice in TGraphX: shape-aware validation at the data layer, so these bugs surface in milliseconds instead of after an hour of training.

The validation primitives

TGraphX exposes three validation utilities for graph data:

python
import tgraphx as tgx
        
        # Basic — checks shapes, edge index bounds, label alignment
        tgx.validate_graph(g)
        
        # Strict — raises on any anomaly instead of returning a result object
        tgx.validate_graph(g, strict=True)
        
        # Specific assertion — node features must be at least rank-3
        tgx.assert_tensor_native(g, min_rank=3)
        
        # Broader invariants — also checks edge attribute alignment and metadata
        tgx.check_graph_invariants(g)
        

Calling tgx.validate_graph(g, strict=True) once after constructing the graph object turns a class of opaque runtime errors into a clear ValueError at construction time.

What does validate_graph actually check?

The list, paraphrased from the source:

  1. x (node features) must be a tensor.
  2. edge_index must be a [2, E] long tensor (or empty).
  3. Every entry in edge_index must be 0 <= idx < num_nodes.
  4. If labels is present, its leading dimension must equal num_nodes.
  5. If edge_attr is present, its leading dimension must equal the edge count.
  6. If node_mask or edge_mask is present, their leading dimensions must match.
  7. All tensors must be on the same device.

These are the most common silent failure modes in GNN data assembly.

A bug validate_graph catches

python
import torch
        import tgraphx as tgx
        
        N = 100
        x = torch.randn(N, 3, 8, 8)
        labels = torch.randint(0, 5, (N,))
        
        # Edge index accidentally includes a node ID >= N
        edge_index = torch.tensor([
            [0, 1, 99, 100],   # last index is out of bounds!
            [1, 2, 0,  3],
        ], dtype=torch.long)
        
        g = tgx.Graph(x=x, edge_index=edge_index, labels=labels)
        
        tgx.validate_graph(g, strict=True)
        # → ValueError: edge_index contains node ID 100 but num_nodes is 100
        

Without validation, this would crash during message passing with a CUDA assertion that took fifteen lines of traceback to interpret.

A bug assert_tensor_native catches

You wrote a pipeline that flattens features at one stage and forgot to remove the flatten:

python
x = torch.randn(100, 3, 8, 8)
        x = x.view(100, -1)     # accidentally flattened to [100, 192]
        g = tgx.Graph(x=x, edge_index=edge_index, labels=labels)
        
        tgx.assert_tensor_native(g, min_rank=3)
        # → AssertionError: node features have rank 2, expected at least 3
        

If you intended to keep tensor structure but lost it during preprocessing, this catches the regression. It also fails fast on accidental reshapes inside a long preprocessing chain.

When validation is too expensive

Validation is cheap but not free. For a graph with 10 million nodes, scanning the entire edge_index to check bounds is measurable. Two strategies:

  1. Validate once at construction, skip during training. This is the typical pattern.
  2. Use validate_graph(g, strict=False) and check the returned issues programmatically. Useful in batched contexts where you want to count problems instead of raising.

Both are documented in docs/api_stability.md in the TGraphX repository.

Beyond validate_graph: leakage and split policy

For supervised learning, two other utilities catch subtler bugs:

python
tgx.check_leakage(train_mask, val_mask, test_mask, strict=True)
        tgx.validate_split_policy(train_mask, val_mask, test_mask, policy="random")
        

check_leakage catches the classic mistake of having the same node ID in train and test masks. validate_split_policy checks that the split satisfies a declared policy (random, by-node, by-graph, etc.).

These are not necessary for every project but are useful in research code where the split logic is fragile and easy to break during refactoring.

Dashboard-level audits

For multi-run experiments, the framework provides directory-level audit utilities:

python
print(tgx.audit_run_dir("runs/exp_001"))
        # → {"ok": True, "files_present": [...], "missing": [], "warnings": []}
        
        print(tgx.dashboard_audit("runs"))
        # → {"ok": True, "run_count": 12, "issues": []}
        

These run after experiments to confirm artifacts are intact and ready for the dashboard or for publication.

Practical recommendation

Add three lines at the top of your data preparation:

python
g = tgx.Graph(x=x, edge_index=edge_index, labels=labels)
        tgx.validate_graph(g, strict=True)
        tgx.check_leakage(train_mask, val_mask, test_mask, strict=True)
        

These three lines have saved real research time in actual TGraphX projects. Adopt them as a default.


FAQ

Q: Does validation slow down training?
A: No. Validation runs once at construction, not during the training loop.

Q: Can I skip validation for very large graphs?
A: Yes. Use strict=False to get a structured result without raising, or skip the call entirely after you trust the data pipeline.

Q: What does check_graph_invariants add over validate_graph?
A: check_graph_invariants also verifies edge attribute alignment, metadata structure, and optional invariants like graph connectedness if requested.

Q: Will TGraphX validate automatically when I construct a graph?
A: No, it does not validate by default. You have to call tgx.validate_graph() explicitly. This is intentional: validation has a cost and the framework lets you choose when to pay it.

Q: What about debugging the model itself, not just data?
A: Use tgx.debug_batch(batch) and tgx.batch_summary(batch) for inspecting a mini-batch's structure during training. They print human-readable summaries of node/edge counts and feature shapes.