Evolutionary Optimization over Graph Structures with TGraphX
When the search space is over graph structures — adding edges to maximize a property, finding a sparse subgraph with high connectivity, designing a graph that optimizes multiple competing objectives — gradient-based methods do not apply directly. Evolutionary algorithms are a natural fit: they handle discrete combinatorial spaces, do not require differentiable objectives, and can explore multiple modes.
TGraphX includes an evolutionary optimization subsystem with GA, simulated annealing, NSGA-II, and hill climbing. This article walks through using it, with the caveat that the entire subsystem is Experimental.
What the subsystem includes
Algorithms:
GeneticAlgorithmOptimizer— population-based GA with mutation and crossoverSimulatedAnnealingOptimizer— single-state SA with temperature scheduleNSGAIIOptimizer— multi-objective optimization with Pareto frontHillClimbingOptimizer— local search baseline
Fitness functions (built-in):
connectivity_fitness— maximize connected component coveragesparsity_fitness— minimize edge countdensity_fitness— maximize edge densitycomposite_fitness— weighted sum of multiple objectives
Genome operations:
GraphGenome— encoding of a graph as a mutable genotypemutate_add_node,mutate_add_edge,mutate_node_feature— mutation operators
A first experiment
The one-call API runs a complete optimization:
import tgraphx as tgx
result = tgx.optimize_graph(
objective="connectivity",
algorithm="ga",
num_nodes=30,
num_generations=100,
seed=42,
)
print(f"Best fitness: {result.best_fitness:.4f}")
print(f"Best genome: {result.best_genome}")
This evolves a population of 30-node graphs over 100 generations to maximize connectivity. The result includes the best graph found, its fitness, and the optimization history.
Switching objectives
Each objective has a different shape and produces different optimal graphs:
for obj in ["connectivity", "sparsity", "density"]:
res = tgx.optimize_graph(objective=obj, algorithm="ga", num_nodes=20, seed=42)
print(f"{obj:15s}: best fitness = {res.best_fitness:.4f}")
Connectivity rewards graphs that are connected — penalizes isolated nodes.
Sparsity rewards graphs with few edges.
Density rewards graphs with many edges relative to the maximum possible.
These can conflict. A maximally-dense graph is also maximally-connected but not sparse.
Multi-objective optimization with NSGA-II
For competing objectives, use NSGA-II to find the Pareto front:
result = tgx.optimize_graph(
objective={
"connectivity": 1.0, # weight
"sparsity": -1.0, # negative — we want few edges
},
algorithm="nsga2",
num_nodes=20,
num_generations=200,
seed=42,
)
# Pareto front: graphs that trade off connectivity and sparsity
for genome in result.pareto_front[:5]:
print(f" conn={genome.connectivity:.3f} sparse={genome.sparsity:.3f}")
NSGA-II returns a Pareto front rather than a single best solution. Each point represents a trade-off: more connectivity at the cost of more edges, or vice versa.
This is the right tool when "best" is not a single objective.
Custom fitness functions
For objectives not built into the framework:
from tgraphx.evolutionary import GeneticAlgorithmOptimizer
def my_fitness(genome):
g = genome.to_graph()
# Compute whatever you want
return some_custom_score(g)
opt = GeneticAlgorithmOptimizer(
fitness_fn=my_fitness,
num_nodes=30,
population_size=50,
mutation_rate=0.1,
)
best = opt.run(num_generations=100)
The optimizer calls my_fitness(genome) for every genome in every generation. For expensive fitness functions, this can be slow.
A real-world pattern: NAS-like architecture search
The framework's evolutionary subsystem is designed for graph structures, but the same pattern applies to neural architecture search where the search space is a computational graph:
def gnn_fitness(genome):
# Construct a GNN architecture from the genome
model = build_gnn_from_genome(genome)
# Train briefly and return validation accuracy
score = quick_train(model, data, epochs=5)
return score
This is conceptually simple but computationally expensive. For practical NAS, weight-sharing methods (DARTS, ProxylessNAS) are usually faster than pure evolutionary search.
Honest limitations
- The Experimental label is real. Some genome operators have edge cases. Test on small problems first.
- The hill climbing and SA optimizers are simpler than GA; for hard problems, GA usually wins.
- Population sizes default to 50; for hard problems consider 100-200.
- The framework does not handle parallel fitness evaluation by default. For expensive fitness, you will want to parallelize manually.
- There is no published comparison to dedicated NAS frameworks. Use TGraphX's evolutionary search for exploratory research, not for production NAS.
When this subsystem is the right fit
- Combinatorial graph design problems with a clear objective.
- Research where you want multi-objective trade-off curves.
- Educational use — the algorithms are clean and easy to understand.
When it is not
- Production-grade NAS for deep models.
- Large-scale combinatorial optimization (use specialized tools like Gurobi or OR-Tools).
- Multi-task optimization across many parallel problems.
Reproducibility
All optimizers accept a seed argument. Wrap experiments in the standard reproducibility context:
with tgx.reproducible(seed=42, deterministic=True):
result = tgx.optimize_graph(objective="connectivity", algorithm="ga", ...)
Note that evolutionary algorithms have many sources of randomness (initial population, mutation, crossover, selection). Seeded runs will be reproducible; the same population evolves identically.
FAQ
Q: How large can the graphs be?
A: The framework has been tested on graphs up to a few hundred nodes. For thousands of nodes, the genome representation and operators become expensive. Consider problem-specific encodings.
Q: Can I run multiple optimizations in parallel?
A: Yes, with Python's multiprocessing. Each run is independent. Aggregate results yourself.
Q: Does the framework support population diversity tracking?
A: NSGA-II tracks Pareto diversity. The GA reports basic statistics. For detailed diversity analysis, you will need to instrument the fitness function manually.
Q: What about hybrid GA + local search?
A: Not built in. You can wrap any optimizer's output with a hill-climbing phase manually.
Q: Is there a documented example of NAS using this?
A: The examples/evolutionary_graph_optimization_demo.py in the TGraphX repository shows the basic pattern. NAS-specific examples are not shipped.