TGraphX Insights Evolutionary Optimization over Graph Structures with TGraphX
← Back to Insights

Evolutionary Optimization over Graph Structures with TGraphX

Target keyword: evolutionary optimization graph neural networks

Evolutionary Optimization over Graph Structures with TGraphX

When the search space is over graph structures — adding edges to maximize a property, finding a sparse subgraph with high connectivity, designing a graph that optimizes multiple competing objectives — gradient-based methods do not apply directly. Evolutionary algorithms are a natural fit: they handle discrete combinatorial spaces, do not require differentiable objectives, and can explore multiple modes.

TGraphX includes an evolutionary optimization subsystem with GA, simulated annealing, NSGA-II, and hill climbing. This article walks through using it, with the caveat that the entire subsystem is Experimental.

What the subsystem includes

Algorithms:

  • GeneticAlgorithmOptimizer — population-based GA with mutation and crossover
  • SimulatedAnnealingOptimizer — single-state SA with temperature schedule
  • NSGAIIOptimizer — multi-objective optimization with Pareto front
  • HillClimbingOptimizer — local search baseline

Fitness functions (built-in):

  • connectivity_fitness — maximize connected component coverage
  • sparsity_fitness — minimize edge count
  • density_fitness — maximize edge density
  • composite_fitness — weighted sum of multiple objectives

Genome operations:

  • GraphGenome — encoding of a graph as a mutable genotype
  • mutate_add_node, mutate_add_edge, mutate_node_feature — mutation operators

A first experiment

The one-call API runs a complete optimization:

python
import tgraphx as tgx
        
        result = tgx.optimize_graph(
            objective="connectivity",
            algorithm="ga",
            num_nodes=30,
            num_generations=100,
            seed=42,
        )
        
        print(f"Best fitness: {result.best_fitness:.4f}")
        print(f"Best genome: {result.best_genome}")
        

This evolves a population of 30-node graphs over 100 generations to maximize connectivity. The result includes the best graph found, its fitness, and the optimization history.

Switching objectives

Each objective has a different shape and produces different optimal graphs:

python
for obj in ["connectivity", "sparsity", "density"]:
            res = tgx.optimize_graph(objective=obj, algorithm="ga", num_nodes=20, seed=42)
            print(f"{obj:15s}: best fitness = {res.best_fitness:.4f}")
        

Connectivity rewards graphs that are connected — penalizes isolated nodes.

Sparsity rewards graphs with few edges.

Density rewards graphs with many edges relative to the maximum possible.

These can conflict. A maximally-dense graph is also maximally-connected but not sparse.

Multi-objective optimization with NSGA-II

For competing objectives, use NSGA-II to find the Pareto front:

python
result = tgx.optimize_graph(
            objective={
                "connectivity": 1.0,    # weight
                "sparsity": -1.0,       # negative — we want few edges
            },
            algorithm="nsga2",
            num_nodes=20,
            num_generations=200,
            seed=42,
        )
        
        # Pareto front: graphs that trade off connectivity and sparsity
        for genome in result.pareto_front[:5]:
            print(f"  conn={genome.connectivity:.3f}  sparse={genome.sparsity:.3f}")
        

NSGA-II returns a Pareto front rather than a single best solution. Each point represents a trade-off: more connectivity at the cost of more edges, or vice versa.

This is the right tool when "best" is not a single objective.

Custom fitness functions

For objectives not built into the framework:

python
from tgraphx.evolutionary import GeneticAlgorithmOptimizer
        
        def my_fitness(genome):
            g = genome.to_graph()
            # Compute whatever you want
            return some_custom_score(g)
        
        opt = GeneticAlgorithmOptimizer(
            fitness_fn=my_fitness,
            num_nodes=30,
            population_size=50,
            mutation_rate=0.1,
        )
        
        best = opt.run(num_generations=100)
        

The optimizer calls my_fitness(genome) for every genome in every generation. For expensive fitness functions, this can be slow.

A real-world pattern: NAS-like architecture search

The framework's evolutionary subsystem is designed for graph structures, but the same pattern applies to neural architecture search where the search space is a computational graph:

python
def gnn_fitness(genome):
            # Construct a GNN architecture from the genome
            model = build_gnn_from_genome(genome)
            # Train briefly and return validation accuracy
            score = quick_train(model, data, epochs=5)
            return score
        

This is conceptually simple but computationally expensive. For practical NAS, weight-sharing methods (DARTS, ProxylessNAS) are usually faster than pure evolutionary search.

Honest limitations

  • The Experimental label is real. Some genome operators have edge cases. Test on small problems first.
  • The hill climbing and SA optimizers are simpler than GA; for hard problems, GA usually wins.
  • Population sizes default to 50; for hard problems consider 100-200.
  • The framework does not handle parallel fitness evaluation by default. For expensive fitness, you will want to parallelize manually.
  • There is no published comparison to dedicated NAS frameworks. Use TGraphX's evolutionary search for exploratory research, not for production NAS.

When this subsystem is the right fit

  • Combinatorial graph design problems with a clear objective.
  • Research where you want multi-objective trade-off curves.
  • Educational use — the algorithms are clean and easy to understand.

When it is not

  • Production-grade NAS for deep models.
  • Large-scale combinatorial optimization (use specialized tools like Gurobi or OR-Tools).
  • Multi-task optimization across many parallel problems.

Reproducibility

All optimizers accept a seed argument. Wrap experiments in the standard reproducibility context:

python
with tgx.reproducible(seed=42, deterministic=True):
            result = tgx.optimize_graph(objective="connectivity", algorithm="ga", ...)
        

Note that evolutionary algorithms have many sources of randomness (initial population, mutation, crossover, selection). Seeded runs will be reproducible; the same population evolves identically.


FAQ

Q: How large can the graphs be?
A: The framework has been tested on graphs up to a few hundred nodes. For thousands of nodes, the genome representation and operators become expensive. Consider problem-specific encodings.

Q: Can I run multiple optimizations in parallel?
A: Yes, with Python's multiprocessing. Each run is independent. Aggregate results yourself.

Q: Does the framework support population diversity tracking?
A: NSGA-II tracks Pareto diversity. The GA reports basic statistics. For detailed diversity analysis, you will need to instrument the fitness function manually.

Q: What about hybrid GA + local search?
A: Not built in. You can wrap any optimizer's output with a hill-climbing phase manually.

Q: Is there a documented example of NAS using this?
A: The examples/evolutionary_graph_optimization_demo.py in the TGraphX repository shows the basic pattern. NAS-specific examples are not shipped.