All Positions

Machine Learning Engineer — Post-Training & RL (Finance)

Hamburg, Germany Fully Remote or Hamburg Part-time or Full-time All levels

About Gradient

Gradient builds AI systems for systematic investing. Our first product is a long/short equity fund powered end-to-end by our models. We move fast, measure reality, and we iterate relentlessly.

We’re looking for someone who can turn ambiguous goals into high-signal experiments, build the post-training/RL stack around them, and extract insights that actually change decisions.

About the Role

You’ll work directly with the founders to improve our post-training infrastructure and RL training environment, and to run the experiments that push our models toward real alpha.

This is a high-agency role at the intersection of:

  • Post-training (SFT/RL, reward shaping, data filtering)
  • Scalable experimentation (ablations, evals, reproducibility)
  • And the messy reality of finance data and backtests

What You Will Do

  • Improve infrastructure for post-training
    • Training recipes, data pipelines, experiment tracking
    • Tooling that makes running the “next best experiment” cheap and reliable
  • Build out our environment for RL training
    • Reward signal design, verifiers/constraints, rollout + sampling strategies
    • Evaluation harnesses that capture what “better” means (beyond a single metric)
  • Design experiments and run ablations
    • Propose hypotheses, isolate variables, prioritize by expected insight per compute €
    • Build the “ablation ladder” that quickly explains performance shifts
  • Fine-tune models to find alpha
    • SFT + RL-style training, dataset construction, filtering, curriculum ideas
    • Iterate on training stability, generalization, and robustness
  • Analyze results and distill insights
    • Write short, high-clarity memos: what we tried → what happened → what it means → what we do next

What We Are Looking For

You don’t need a finance background, but you should be curious about markets and excited to learn fast.

Must-Haves

  • Outstanding analytical ability — you can reason from messy evidence to crisp next actions
  • Deep understanding of modern AI training (post-training + RL concepts)
  • Strong intuition for experiment design and compute efficiency
    • You don’t run obviously flawed experiments
    • You know what to measure, what to freeze, and what to change
  • Strong Python skills and comfort with the modern ML stack

Traits We Care About a Lot

  • High agency: you don’t wait for specs; you define the work and ship
  • Taste: you know what “good” looks like in evals, tooling, and results
  • Scientific discipline: you can say “we don’t know yet” and design the test that makes it knowable

Nice to Have

  • Experience with RLHF/GRPO-style methods, reward modeling, preference optimization
  • Distributed training / large-scale rollouts / inference throughput optimization
  • Experience building eval harnesses that guide iteration

What We Offer

  • Freedom: we don’t care where or when you work — as long as you deliver
  • No corporate bullshit: No bureaucracy and time-wasting busywork
  • Leverage: work directly with the founders; your work changes the roadmap immediately
  • A hard, interesting problem: RL + post-training applied to real financial prediction and portfolio outcomes
  • Competitive compensation + meaningful upside (details depend on seniority and fit)

About Gradient Technologies

Gradient Technologies GmbH is a Hamburg-based AI and software company focused on financial analysis. We build AI systems that process large volumes of financial data to generate equity return forecasts and construct market-neutral investment strategies.

Our philosophy: Efficiency over complexity. Do the obvious things well. Follow the gradient.

How to Apply

Send your CV to contact@gradtec.ai. That’s it — no cover letter needed.