# GA Tips & Recipes

This page distills practical advice for using the GA in {mod}`evoproc.ga_scaffold_structured` effectively.

## Scorers: structural vs task-eval

- **Structural hygiene (default):**  
  Uses {mod}`evoproc.scorers`’ adapter over the validator suite. Great for bootstrapping before you wire up execution.
- **Task-eval scoring (recommended for benchmarks):**  
  Pass all three to `ga.run(...)`:
  - `final_answer_schema`
  - `run_steps_fn(proc, question, final_answer_schema, model, print_bool=False) -> state`
  - `eval_fn(state, proc) -> float` (e.g., exact-match or numeric closeness)

```python
best, _ = ga.run(
    task_description=question,
    final_answer_schema=get_schema("gsm"),
    run_steps_fn=run_steps_fn,   # calls your runner
    eval_fn=eval_fn,             # returns [0,1]
)
```

## Sensible hyperparameters (start here)

```python
GAConfig(
  population_size=6,         # 6–12 for quick loops
  elitism=2,                 # keep top 2 unchanged
  crossover_rate=0.7,        # favor crossover
  mutation_rate=0.3,         # still allow edits
  max_generations=3,         # 3–6 to start
  tournament_k=3,
  random_immigrant_rate=0.10,# diversity without chaos
  seed=42,
)
```

- Increase generations first, then population.
- Keep elitism ≥ 1 so progress isn’t lost.
- Immigrants help escape local minima; 5–20% is typical.

## Reproducibility 
- Set seed in both GAConfig and your backend query options.
- Log the chosen model and any temperature/top-p settings.
- Persist runs to JSONL: keep best.proc, fitness, and final state.
```python
import json
with open("runs.jsonl","a") as f:
    f.write(json.dumps({
        "qid": qid,
        "fitness": best.fitness,
        "proc": best.proc,
        "state": final_state,
    }) + "\n")
```

## Efficient task-eval loops
- Use a lightweight run_steps_fn: small schemas, minimal I/O, strict JSON parsing.
- Cache step prompts if you’re experimenting; only the best individual per gen must be executed.
- If your runner is expensive, consider generation-end evaluation only (evaluate after reproduction, not for every candidate mid-gen).

## Operators: practical notes
- Crossover tends to clean up naming and structure—keep it dominant (0.6–0.8).
- Mutation is single, small edits by design (split/insert/rename/verify). If progress stalls, slightly raise mutation_rate.

## Diagnosing stagnation
- Flat fitness across generations → Scorer too coarse. Switch to task-eval or add a shaping term (e.g., small penalty for > N steps).
- Validator ping-pong → Tighten repair prompts; prefer minimal fixes and renumber steps after repair.
- Degenerate loops (same child every time) → add random_immigrant_rate and ensure seed is not globally constant if you want exploration.

## GSM8K recipe (mini)
```python
FINAL_SCHEMA = get_schema("gsm")

def run_steps_fn(proc, q, s, m, print_bool=False):
    return run_steps_stateful_minimal(proc, q, s, m, query_fn=query, print_bool=print_bool)

def eval_fn(state, proc):
    import math, re
    nums = re.findall(r"-?\d+(?:\.\d+)?", state.get("answer",""))
    pred = state.get("answer_numerical") or (float(nums[-1]) if nums else None)
    gold = state.get("_gold_num")
    return 1.0 if (pred is not None and gold is not None and math.isclose(pred, gold, abs_tol=1e-6)) else 0.0

# loop
for ex in examples:  # list of {id, question, answer}
    gold_num = extract_number(ex["answer"])
    best, _ = ga.run(
        task_description=ex["question"],
        final_answer_schema=FINAL_SCHEMA,
        run_steps_fn=run_steps_fn,
        eval_fn=lambda st, pr: eval_fn({**st, "_gold_num": gold_num}, pr),
    )
    state = run_steps_fn(best.proc, ex["question"], FINAL_SCHEMA, ga.model)
```

## Troubleshooting
- ModuleNotFoundError during docs build
Make sure both packages are installed in the docs env and conf.py adds their src/ paths.
- LLM returns prose, not JSON
Lower temperature and include the schema verbatim (format= parameter if your backend supports it).
- “Missing required output …” at runtime
Keep strict_missing=True to catch it early; if the model is flaky, allow one retry with the same prompt/seed.

## When to stop evolving
- Fitness plateaus for ≥2 generations.
- Best procedure stabilizes (diff of steps/variables is small).
- Wall-clock budget reached (log total tokens/calls if possible).