About Gradient

Gradient builds AI systems for systematic investing. Our first product is a long/short equity fund powered end-to-end by our models. We move fast, measure reality, and we iterate relentlessly.

We’re looking for someone who can turn ambiguous goals into high-signal experiments, build the post-training/RL stack around them, and extract insights that actually change decisions.

About the Role

You’ll work directly with the founders to improve our post-training infrastructure and RL training environment, and to run the experiments that push our models toward real alpha.

This is a high-agency role at the intersection of:

Post-training (SFT/RL, reward shaping, data filtering)
Scalable experimentation (ablations, evals, reproducibility)
And the messy reality of finance data and backtests

What You Will Do

Improve infrastructure for post-training
- Training recipes, data pipelines, experiment tracking
- Tooling that makes running the “next best experiment” cheap and reliable
Build out our environment for RL training
- Reward signal design, verifiers/constraints, rollout + sampling strategies
- Evaluation harnesses that capture what “better” means (beyond a single metric)
Design experiments and run ablations
- Propose hypotheses, isolate variables, prioritize by expected insight per compute €
- Build the “ablation ladder” that quickly explains performance shifts
Fine-tune models to find alpha
- SFT + RL-style training, dataset construction, filtering, curriculum ideas
- Iterate on training stability, generalization, and robustness
Analyze results and distill insights
- Write short, high-clarity memos: what we tried → what happened → what it means → what we do next

What We Are Looking For

You don’t need a finance background, but you should be curious about markets and excited to learn fast.

Must-Haves

Outstanding analytical ability — you can reason from messy evidence to crisp next actions
Deep understanding of modern AI training (post-training + RL concepts)
Strong intuition for experiment design and compute efficiency
- You don’t run obviously flawed experiments
- You know what to measure, what to freeze, and what to change
Strong Python skills and comfort with the modern ML stack

Traits We Care About a Lot

High agency: you don’t wait for specs; you define the work and ship
Taste: you know what “good” looks like in evals, tooling, and results
Scientific discipline: you can say “we don’t know yet” and design the test that makes it knowable

Nice to Have

Experience with RLHF/GRPO-style methods, reward modeling, preference optimization
Distributed training / large-scale rollouts / inference throughput optimization
Experience building eval harnesses that guide iteration

What We Offer

Freedom: we don’t care where or when you work — as long as you deliver
No corporate bullshit: No bureaucracy and time-wasting busywork
Leverage: work directly with the founders; your work changes the roadmap immediately
A hard, interesting problem: RL + post-training applied to real financial prediction and portfolio outcomes
Competitive compensation + meaningful upside (details depend on seniority and fit)

About Gradient Technologies

Gradient Technologies GmbH is a Hamburg-based AI and software company focused on financial analysis. We build AI systems that process large volumes of financial data to generate equity return forecasts and construct market-neutral investment strategies.

Our philosophy: Efficiency over complexity. Do the obvious things well. Follow the gradient.

How to Apply

Send your CV to contact@gradtec.ai. That’s it — no cover letter needed.

Machine Learning Engineer — Post-Training & RL (Finance)