TGraphX Insights Knowledge Graph Embedding with Tensor Features in TGraphX
← Back to Insights

Knowledge Graph Embedding with Tensor Features in TGraphX

Target keyword: knowledge graph embedding pytorch tensor

Knowledge Graph Embedding with Tensor Features in TGraphX

Knowledge graph (KG) embedding models map entities and relations into a vector space such that valid triples have higher score than invalid ones. The classic family — TransE, DistMult, ComplEx, RotatE, RESCAL, SimplE — has been around for years and is the foundation for most KG completion benchmarks.

TGraphX includes all six of these models, plus a less common feature: entities can carry tensor-valued features in addition to their learned embedding. This tutorial walks through a complete KG completion workflow and explains where the tensor extension matters.

Setup

bash
pip install tgraphx
        

Step 1: Construct a knowledge graph

python
import torch
        import tgraphx as tgx
        
        # Triples format: [num_triples, 3] — (head, relation, tail)
        triples = torch.tensor([
            [0, 0, 1],   # entity 0 — relation 0 → entity 1
            [1, 1, 2],
            [2, 0, 3],
            [0, 1, 3],
        ], dtype=torch.long)
        
        kg = tgx.KnowledgeGraph.from_triples(triples)
        print(kg.num_entities, kg.num_relations)
        

KnowledgeGraph is the KG analog of Graph. You can construct it from (head, relation, tail) triples or from separate head/relation/tail tensors via from_hrt(heads, relations, tails).

Step 2: Train a TransE model with the one-call API

The simplest workflow uses tgx.kg_completion():

python
result = tgx.kg_completion(
            triples=triples,
            num_entities=kg.num_entities,
            num_relations=kg.num_relations,
            model="transe",
            embedding_dim=64,
            epochs=20,
            seed=42,
        )
        print(result.metrics)
        

This handles training, evaluation, and metric computation. The result includes filtered MRR (Mean Reciprocal Rank) and Hits@K (the proportion of correct entities ranked in the top K predictions).

Step 3: Switching models

Changing the model is a one-word swap:

python
result_distmult = tgx.kg_completion(triples=triples, model="distmult", ...)
        result_complex  = tgx.kg_completion(triples=triples, model="complex",  ...)
        result_rotate   = tgx.kg_completion(triples=triples, model="rotate",   ...)
        

All four use the same training loop and evaluation. Pick based on the relation patterns your KG has:

  • TransE: simple, well-understood, struggles with one-to-many relations.
  • DistMult: symmetric relations only (good for similarity-like links).
  • ComplEx: handles antisymmetric relations (e.g., "parent of" vs "child of").
  • RotatE: handles composition (if r1(a, b) and r2(b, c), then r1·r2(a, c)).
  • RESCAL: bilinear, expressive but more parameters.
  • SimplE: symmetric, comparable to DistMult.

For a new KG, TransE is the standard first attempt.

Step 4: The tensor feature extension

So far, every entity is represented only by its learned embedding. Suppose your entities also have intrinsic features — entity 0 is a movie with a poster image, entity 1 is a user with a profile vector. TGraphX lets you attach those:

python
# Per-entity features as a tensor
        poster_features = torch.randn(kg.num_entities, 3, 64, 64)  # [N, C, H, W]
        profile_features = torch.randn(kg.num_entities, 128)        # vector
        
        # Multimodal entity feature dictionary
        entity_features = {
            "image": poster_features,    # rank-4 tensor
            "profile": profile_features, # rank-2 tensor
        }
        
        # (Multimodal KG support is a TGraphX-specific extension; see docs/kg_multimodal_tensor_features.md)
        

The framework includes learnable projectors that map each modality into the embedding space before scoring. This is useful when your entities have rich features that are correlated with the relation structure.

For a standard benchmark (FB15k-237, WN18RR), you do not need this. For a research KG where entities have side information (movies with posters, papers with abstracts, products with images), it can produce a measurable improvement over ID-only embeddings.

Step 5: Negative sampling

KG embedding models train by contrasting positive triples against negative ones. TGraphX exposes the choice:

python
from tgraphx import negative_sampling
        
        neg = negative_sampling(triples, num_entities=kg.num_entities, num_negatives=128)
        

The framework includes uniform, Bernoulli, filtered, and typed negative samplers. The default in tgx.kg_completion() is uniform; for serious work, use filtered (which avoids accidentally sampling true triples as negatives) by passing neg_sampler="filtered".

Step 6: Evaluation with filtered metrics

The reported metrics use filtered ranking by default, which means that when ranking candidates for (head, relation, ?), the other true tails are removed from the ranking. This matches the standard reporting in KG benchmarks and avoids overcounting penalties from valid alternative answers.

python
print(f"Filtered MRR:    {result.metrics['mrr']:.4f}")
        print(f"Filtered Hits@1: {result.metrics['hits_at_1']:.4f}")
        print(f"Filtered Hits@10: {result.metrics['hits_at_10']:.4f}")
        

Step 7: Saving and loading

python
kg.save("runs/kg_exp/graph.tgx")
        kg2 = tgx.KnowledgeGraph.load("runs/kg_exp/graph.tgx")
        

The .tgx format preserves entity features (including rank-4 tensors) that standard KG formats cannot.

Honest limitations

  • The framework's KG implementations are correct and reproduce standard benchmark scores within reasonable tolerance, but PyKEEN has a wider feature set (hyperparameter optimization, more samplers, more evaluation protocols). For competition-grade KG benchmarking, PyKEEN is the more mature choice.
  • The multimodal extension is research-grade. There are no published large-scale studies showing it consistently beats ID-only embeddings on standard benchmarks. It is useful when you have strong reason to believe entity features carry information.
  • Temporal KGs are supported but labeled Experimental.

When TGraphX's KG module is the right fit

  • You want a single PyTorch library for KG, GNN, and other graph workflows.
  • Your entities have rich tensor features you want to include in the embedding.
  • You need reproducible KG experiments with audit artifacts.
  • You are doing graduate-level research and prototype iteration matters more than perfect benchmark scores.

For everything else, PyKEEN remains the standard.


FAQ

Q: What is the default embedding_dim?
A: 64 in the one-call API. For competitive benchmark numbers, 200-500 is typical.

Q: How do I implement a custom scoring function?
A: Subclass tgraphx.kg.KGModel and override the score method. The training loop in tgx.kg_completion() calls your scoring function during forward.

Q: Are temporal KGs supported?
A: Yes, see tgraphx.kg.TemporalKnowledgeGraph — but this is Experimental. Test carefully.

Q: How do I extract entity embeddings after training?
A: result.model.entity_embeddings.weight is the entity embedding matrix. result.model.relation_embeddings.weight is the relation matrix.

Q: Can I use TGraphX KG embeddings as features for a downstream GNN?
A: Yes. Extract the embeddings, pass them as node_features to a tgx.Graph, and proceed. This is a common pattern for graph-aware recommendation.