Tool Calling & Agentic Training

Telescope supports training models that use tools during rollouts via environment-level tool calling using the ToolEnvironment base class.

ToolEnvironment

ToolEnvironment extends MultiTurnEnvironment with built-in tool calling support. You define tools as Python functions, and the class handles schema generation, prompt injection, tool call parsing, execution, and result formatting.

Defining tools

Tools are regular Python functions with type hints and docstrings:

def add(a: float, b: float) -> float:
    """Add two numbers together."""
    return a + b

def subtract(a: float, b: float) -> float:
    """Subtract b from a."""
    return a - b

Pass them to the constructor:

class MyToolEnvironment(ToolEnvironment):
    def __init__(self, **kwargs):
        super().__init__(
            tools=[add, subtract, multiply, divide],
            max_turns=5,
            tool_call_format="xml",
            system_prompt="You are a math assistant. Use tools for calculations.",
            **kwargs,
        )

Telescope automatically converts each function to an OpenAI-compatible tool schema using func_to_tool_schema(). The type hints map to JSON Schema types (float → number, str → string, int → integer, bool → boolean), and the docstring becomes the tool description.

How tool calls are processed

Each turn follows this cycle:

Model output → parse_tool_calls() → execute_tool() → format_tool_result() → next turn

Parse — The model’s completion is scanned for tool calls. By default, XML format is used with a JSON object inside the tags:
```
<tool_call>
{"name": "add", "arguments": {"a": 15, "b": 7}}
</tool_call>
```
Execute — Each parsed ToolCall is executed by calling the corresponding Python function with the parsed arguments.
Format — Results are formatted as tool response messages and appended to the conversation.
Check — is_final_answer() determines if the model is providing a final answer (no tool calls) or wants to continue using tools.

Building a ToolEnvironment

The minimal subclass needs load_dataset and compute_reward:

from telescope.environments.tool_env import ToolEnvironment
from telescope.environments.base import Sample, RewardResult, RolloutState
from telescope.environments.parsers import extract_xml_tag


class MyToolEnv(ToolEnvironment):
    def __init__(self, **kwargs):
        super().__init__(
            tools=[my_tool_a, my_tool_b],
            max_turns=8,
            system_prompt="Use tools to solve the task.",
            **kwargs,
        )

    def load_dataset(self, num_samples=-1, **kwargs):
        # Load and return list[Sample]
        ...

    def is_final_answer(self, completion, state):
        """Check if the model is giving a final answer instead of a tool call."""
        answer = extract_xml_tag(completion, "answer")
        if answer:
            return True
        return len(self.parse_tool_calls(completion)) == 0

    def compute_reward(self, state, eos_token=""):
        tool_metrics = self.get_tool_metrics(state)
        correct = check_answer(state)  # your verification logic
        return RewardResult(
            total_reward=1.0 if correct else 0.0,
            sample_metrics={**tool_metrics, "correct": float(correct)},
        )

Override points

Method	Default behavior	When to override
`parse_tool_calls(text)`	Parses XML-tagged tool calls	Custom tool call format
`execute_tool(tool_call)`	Calls the matching Python function	Tools need side effects, async I/O, or sandbox execution
`format_tool_result(result)`	Formats as XML tool response	Custom result formatting
`is_final_answer(completion, state)`	True if no tool calls found	Custom completion detection (e.g., `<answer>` tags)

Tool metrics

get_tool_metrics(state) returns a dict with usage stats from the trajectory:

{
    "total_tool_calls": 3,
    "tool_success_count": 2,
    "tool_error_count": 1,
    "tool_success_rate": 0.67,
    "unique_tools_used": 2,
    "add_calls": 2,         # per-tool call counts
    "subtract_calls": 1,
}

These are useful both for reward computation (e.g., penalizing excessive tool use) and for monitoring via sample_metrics. See Metrics for details on how sample metrics are tracked and displayed in the UI.

Sandbox execution

For environments that need to execute code (not just call Python functions), Telescope provides a pluggable sandbox system.

SandboxConfig

from telescope.environments._sandbox import SandboxConfig, get_provider

config = SandboxConfig(
    image="python:3.11-slim",
    cpu=2,
    memory_mb=4096,
    disk_size_gb=10,
    gpu_count=0,
    timeout_seconds=300,
    environment_vars={"MY_VAR": "value"},
    name="my-sandbox",             # optional identifier
    extra={"template": "custom"},  # provider-specific overrides
)

provider = get_provider("prime")  # or "modal", "daytona", "e2b"
handle = await provider.create(config)
result = await provider.execute(handle, "python -c 'print(1+1)'", timeout=30)
# result.exit_code, result.stdout, result.stderr
await provider.destroy(handle)

Supported providers

Telescope is agnostic to which sandbox provider is used — any provider that implements the SandboxProvider interface (create, execute, upload_bytes, upload_file, destroy) will work. For convenience, the following providers come pre-configured:

Provider	Description	Credentials
`prime`	Prime infrastructure	`PRIME_API_KEY` env var or `prime login`
`modal`	Cloud-based sandboxes with fast cold starts	`MODAL_TOKEN_ID` env var or Modal SDK auth
`daytona`	Self-hosted sandbox environments	`DAYTONA_API_KEY` or `DAYTONA_JWT_TOKEN` env var
`e2b`	Cloud sandboxes for prototyping and development	`E2B_API_KEY` env var

All providers validate credentials at startup and fail fast if the required SDK package is missing or credentials are invalid.

Using sandboxes in environments

A typical sandbox environment follows this pattern:

Create sandboxes in create_initial_state() with concurrency control via semaphores
Execute commands in env_response() by parsing tool calls and running them in the sandbox
Clean up in a destroy hook when the rollout completes

Sandbox environments use async I/O throughout. The sandbox provider handles timeout enforcement, error translation, and resource cleanup.

Multi-turn configuration for agentic tasks

Key config parameters for tool-using and agentic environments:

# Scheduling: prioritize earlier turns to reduce head-of-line blocking
vllm_scheduling_policy: "priority"

# Reuse exact token IDs across turns (avoids tokenization mismatches)
interleaved_rollouts: true

# Limit concurrent requests per server (important for multi-turn)
max_concurrent_prompts_per_server: 32

# Maximum turns before stopping
# Set in your environment's __init__ via max_turns parameter

priority scheduling is important for multi-turn environments: it ensures the model completes earlier turns before starting new ones, preventing scenarios where later turns queue behind a flood of first-turn requests. interleaved_rollouts (enabled by default) reuses token IDs from previous turns exactly, avoiding subtle tokenization differences that could corrupt logprob computation across turns.

Documentation Index

​ToolEnvironment

​Defining tools

​How tool calls are processed

​Building a ToolEnvironment

​Override points

​Tool metrics

​Sandbox execution

​SandboxConfig

​Supported providers

​Using sandboxes in environments

​Multi-turn configuration for agentic tasks