- Dataset — the prompts to train on
- Prompt formatting — how to present prompts to the model
- Reward function — how to score the model’s completions
Single-turn vs multi-turn
Single-turn
The simplest type. The model receives a prompt, generates one completion, and gets a reward.Multi-turn
The model interacts with the environment over multiple rounds. After each model response, the environment provides feedback, and the model responds again until a stop condition is met.Building a single-turn environment
Create a folder undersrc/telescope/environments/ with an environment.py file. The folder name becomes the environment name automatically. Extend SingleTurnEnvironment and implement load_dataset() and compute_reward():
Key types
Sample — a single training example:
prompt— the question or task text (string or list of chat messages)answer— ground truth for reward computationmetadata— any extra data needed by your reward function
RewardResult — the output of reward computation:
total_reward— the scalar reward used for trainingsample_metrics— component breakdown for logging (e.g.,{"format": 0.5, "correctness": 1.0})golden_answers— ground truth answers for display in the UIinfo_turns— per-turn text info for display in the UI (e.g., stderr, summaries). Each entry is a dict withturn_order,info_key,info_value, andinfo_type
Prompt formatting
SingleTurnEnvironment handles prompt formatting automatically. You can customize the system prompt and instruction prompt:
apply_chat_template() to format the messages at rollout time.
Building a multi-turn environment
Multi-turn environments extendMultiTurnEnvironment. In addition to load_dataset and compute_reward, you implement the interaction loop:
Key types
RolloutState — tracks the full rollout:
sample— the originalSampleenv_name— the environment name this rollout belongs totrajectory— list ofTrajectoryStep(one per turn, with prompt, completion, token IDs, logprobs)custom— dict for your per-game state (scores, board state, etc.)num_turns— how many turns have been completedis_completed/stop_reason— set by the orchestrator when rollout endserror— set when the rollout encounters an error
env_response — called after each model response:
- Receives the full message history and the current state
- Returns a list of messages to append (typically one
usermessage with feedback) - Return an empty list
[]to signal the game is over
is_done — called after each turn:
- Returns
(True, "reason")to stop or(False, None)to continue - The default implementation checks
max_turns
Auto-discovery
Telescope discovers environments automatically. To register a new environment, just create a folder undersrc/telescope/environments/:
__init__.py or manual registration needed. You can then reference it in your config:
Configuration
Environments are configured in your training YAML under theenvironments key:
Multi-environment training
You can train on multiple environments simultaneously by listing them with weights:Next steps
- Metrics — sample metrics, reward tracking, and what gets logged to the UI
- Tool Calling & Agentic Training — building tool-using environments with
ToolEnvironment - Data Preparation — dataset formats, prompt templates, and sizing guidelines

