Skip to content

Latest commit

 

History

History
589 lines (464 loc) · 14.6 KB

File metadata and controls

589 lines (464 loc) · 14.6 KB
layout default
title OpenHands Tutorial - Chapter 1: Getting Started
nav_order 1
has_children false
parent OpenHands Tutorial

Chapter 1: Getting Started with OpenHands

Welcome to Chapter 1: Getting Started with OpenHands. In this part of OpenHands Tutorial: Autonomous Software Engineering Workflows, you will build an intuitive mental model first, then move into concrete implementation details and practical production tradeoffs.

Install OpenHands, understand its architecture, and execute your first autonomous coding task.

Overview

OpenHands is a powerful autonomous AI software engineering agent that can perform complex coding tasks independently. This chapter covers installation, basic concepts, and your first hands-on experience with autonomous development.

Installation and Setup

System Requirements

# Minimum requirements
- Python 3.10+
- 8GB RAM (16GB recommended)
- Linux/macOS/Windows
- Node.js 18+ (for frontend components)

# For GPU acceleration (optional)
- CUDA 11.8+ (NVIDIA GPUs)
- ROCm 5.4+ (AMD GPUs)

Installing OpenHands

# Install via pip
pip install openhands

# Or install from source for latest features
git clone https://github.com/All-Hands-AI/OpenHands.git
cd OpenHands
pip install -e .

# Install with all optional dependencies
pip install openhands[all]

Setting up API Keys

OpenHands requires API keys for the underlying language models:

# Set environment variables
export OPENAI_API_KEY="your-openai-api-key"
export ANTHROPIC_API_KEY="your-anthropic-api-key"  # Alternative

# For other providers
export TOGETHER_API_KEY="your-together-api-key"
export DEEPINFRA_API_KEY="your-deepinfra-api-key"

Configuration File

Create a configuration file for OpenHands:

# config.toml
[core]
workspace_base = "./workspace"
persist_sandbox = false
run_as_openhands = true
runtime_startup_timeout = 120

[security]
confirmation_mode = false
security_analyzer = ""
allow_file_operations = true

[models]
embedding_model = "local"  # or "openai"
llm_model = "gpt-4"
api_key = "your-api-key"
custom_llm_provider = ""
max_input_tokens = 0
max_output_tokens = 0
input_cost_per_token = 0.0
output_cost_per_token = 0.0
ollama_base_url = ""
drop_params = false

[sandbox]
container_image = "ghcr.io/all-hands-ai/openhands:main"
local_runtime = "eventstream"
runtime_startup_timeout = 120
runtime_startup_env_vars = {}
enable_auto_fix = false
use_host_network = false
runtime_cls = "openhands.runtime.impl.eventstream.EventStreamRuntime"

[llm]
model = "gpt-4"
api_key = "your-api-key"
custom_llm_provider = ""
embedding_model = "local"
embedding_base_url = ""
max_input_tokens = 0
max_output_tokens = 0
input_cost_per_token = 0.0
output_cost_per_token = 0.0
ollama_base_url = ""
drop_params = false

Core Architecture

OpenHands Components

from openhands import OpenHands, Task
from openhands.runtime import get_runtime
from openhands.agent import Agent

# Main components
agent = Agent()  # The AI agent
runtime = get_runtime()  # Execution environment
task = Task()  # Task specification

# Combined system
openhands = OpenHands(
    agent=agent,
    runtime=runtime
)

Agent Types

OpenHands supports different agent types for various scenarios:

from openhands.agent import CodeActAgent, BrowsingAgent, PlannerAgent

# CodeActAgent: Primary agent for coding tasks
code_agent = CodeActAgent()

# BrowsingAgent: For web-related tasks
browse_agent = BrowsingAgent()

# PlannerAgent: For complex multi-step planning
planner_agent = PlannerAgent()

# Custom agent configuration
custom_agent = CodeActAgent(
    llm_config={
        "model": "gpt-4",
        "api_key": "your-key",
        "temperature": 0.1,  # Lower temperature for coding
        "max_tokens": 4096
    }
)

Runtime Environments

OpenHands provides different execution environments:

from openhands.runtime import LocalRuntime, DockerRuntime, RemoteRuntime

# Local runtime (direct execution)
local_runtime = LocalRuntime()

# Docker runtime (sandboxed execution)
docker_runtime = DockerRuntime(
    container_image="ghcr.io/all-hands-ai/openhands:main",
    workspace_base="/workspace"
)

# Remote runtime (for distributed setups)
remote_runtime = RemoteRuntime(
    host="remote-server",
    port=8080
)

Your First OpenHands Task

Basic Task Execution

from openhands import OpenHands

# Initialize OpenHands
openhands = OpenHands()

# Define a simple task
task = "Create a Python function that calculates the factorial of a number"

# Execute the task
result = openhands.run(
    task=task,
    workspace="./my_first_project"
)

# Check the result
print("Task completed!")
print(f"Generated code: {result.code}")
print(f"Execution output: {result.output}")
print(f"Files created: {result.files_created}")

Understanding the Result

# The result object contains detailed information
print(f"Task: {result.task}")
print(f"Status: {result.status}")  # 'completed', 'failed', etc.
print(f"Duration: {result.execution_time} seconds")

# Code generated by the agent
print("Generated code:")
print(result.code)

# Files created/modified
for file_path in result.files_created:
    print(f"Created: {file_path}")

for file_path in result.files_modified:
    print(f"Modified: {file_path}")

# Any errors or issues
if result.errors:
    print("Errors encountered:")
    for error in result.errors:
        print(f"  - {error}")

Interactive Session

from openhands import OpenHands

# Start an interactive session
openhands = OpenHands()

# Begin interactive mode
session = openhands.start_interactive_session(
    workspace="./interactive_workspace"
)

print("OpenHands interactive mode started!")
print("Type 'help' for commands, 'exit' to quit")

while True:
    user_input = input("> ")

    if user_input.lower() == 'exit':
        break
    elif user_input.lower() == 'help':
        print("Commands:")
        print("  help - Show this help")
        print("  status - Show current status")
        print("  files - List workspace files")
        print("  <task> - Execute a coding task")
        continue

    # Execute the user's task
    try:
        result = session.execute_task(user_input)
        print(f"Result: {result.output}")

        if result.code:
            print(f"Code generated: {result.code}")

    except Exception as e:
        print(f"Error: {e}")

Task Specification

Basic Task Format

# Simple task specification
simple_task = {
    "description": "Create a hello world function in Python",
    "language": "python",
    "requirements": ["Function should return 'Hello, World!'"],
    "constraints": ["Use proper function naming", "Include docstring"]
}

# Execute simple task
result = openhands.run(task=simple_task)

Advanced Task Specification

# Complex task with detailed requirements
complex_task = {
    "description": "Build a REST API for a task management system",
    "components": {
        "backend": {
            "framework": "FastAPI",
            "database": "SQLite",
            "models": ["User", "Task", "Category"],
            "endpoints": [
                "POST /users - Create user",
                "GET /users/{id} - Get user",
                "POST /tasks - Create task",
                "GET /tasks - List tasks",
                "PUT /tasks/{id} - Update task",
                "DELETE /tasks/{id} - Delete task"
            ]
        },
        "frontend": {
            "framework": "React",
            "components": ["TaskList", "TaskForm", "UserProfile"],
            "features": ["CRUD operations", "Real-time updates"]
        },
        "testing": {
            "unit_tests": True,
            "integration_tests": True,
            "api_tests": True
        },
        "documentation": {
            "api_docs": True,
            "readme": True,
            "deployment_guide": True
        }
    },
    "requirements": [
        "Use proper error handling",
        "Implement input validation",
        "Add authentication/authorization",
        "Include comprehensive tests",
        "Create deployment configuration"
    ],
    "constraints": [
        "Follow REST API best practices",
        "Use type hints in Python",
        "Implement proper database relationships",
        "Ensure security best practices"
    ]
}

# Execute complex task
result = openhands.run(
    task=complex_task,
    max_execution_time=1800,  # 30 minutes
    save_progress=True
)

Working with Workspaces

Workspace Management

from openhands.workspace import Workspace

# Create a new workspace
workspace = Workspace("./my_project")

# Initialize workspace with template
workspace.init_from_template("python-fastapi")

# Or start with empty workspace
workspace.create_empty()

# Workspace operations
print(f"Workspace path: {workspace.path}")
print(f"Files: {workspace.list_files()}")

# Create files
workspace.create_file("main.py", "print('Hello, World!')")

# Read files
content = workspace.read_file("main.py")
print(f"File content: {content}")

# Execute commands in workspace
result = workspace.run_command("python main.py")
print(f"Command output: {result.stdout}")

Persistent Workspaces

# Create persistent workspace for multi-session work
persistent_workspace = Workspace("./persistent_project", persistent=True)

# The workspace will remember state between OpenHands sessions
# Useful for long-running development projects

# Save workspace state
persistent_workspace.save_state()

# Load workspace state in new session
loaded_workspace = Workspace.load_state("./persistent_project")

Error Handling and Debugging

Common Issues and Solutions

try:
    result = openhands.run(task="Create a Python web server")
except Exception as e:
    print(f"Execution failed: {e}")

    # Get detailed error information
    if hasattr(result, 'error_details'):
        print("Error details:")
        print(result.error_details)

    # Check logs
    logs = openhands.get_logs()
    print("Recent logs:")
    for log_entry in logs[-10:]:  # Last 10 entries
        print(f"  {log_entry['timestamp']}: {log_entry['message']}")

Debugging Mode

# Enable debugging mode for detailed execution tracing
openhands.enable_debug_mode()

# Run task with debugging
result = openhands.run(
    task="Debug this Python function: def add(a, b): return a + b",
    debug=True
)

# Access debug information
debug_info = result.debug_info
print("Execution trace:")
for step in debug_info['trace']:
    print(f"  Step {step['step']}: {step['action']}")
    print(f"    Result: {step['result']}")
    print(f"    Duration: {step['duration']}ms")

Security Considerations

Sandboxed Execution

OpenHands runs in secure sandboxed environments by default:

# Configure sandbox security
secure_openhands = OpenHands(
    runtime=DockerRuntime(
        container_image="ghcr.io/all-hands-ai/openhands:main",
        security_opts=[
            "--cap-drop=ALL",  # Drop all capabilities
            "--network=none",  # No network access
            "--read-only",     # Read-only root filesystem
            "--tmpfs=/tmp"     # Writable temp directory
        ]
    )
)

# Execute task in secure sandbox
result = secure_openhands.run(
    task="Create a file processing script",
    security_level="high"
)

Permission Management

# Configure execution permissions
permissions = {
    "file_operations": {
        "read": True,
        "write": True,
        "delete": False,  # Disable file deletion
        "execute": True
    },
    "network_access": {
        "outbound": False,  # No internet access
        "localhost": True   # Allow localhost connections
    },
    "command_execution": {
        "allowed_commands": ["python", "pip", "npm", "node"],
        "blocked_commands": ["rm", "sudo", "curl", "wget"]
    }
}

# Apply permissions
openhands.set_permissions(permissions)

Performance Optimization

Resource Configuration

# Configure for optimal performance
high_perf_openhands = OpenHands(
    runtime=DockerRuntime(
        container_image="ghcr.io/all-hands-ai/openhands:main",
        resources={
            "cpu": "2.0",           # 2 CPU cores
            "memory": "4g",         # 4GB RAM
            "gpu": "1",             # 1 GPU (if available)
            "storage": "10g"        # 10GB storage
        }
    ),
    agent=CodeActAgent(
        llm_config={
            "model": "gpt-4",
            "temperature": 0.1,     # Lower temperature for consistency
            "max_tokens": 2048,     # Reasonable token limit
            "cache_enabled": True   # Enable response caching
        }
    )
)

Caching and Reuse

# Enable caching for repeated operations
openhands.enable_caching(
    cache_dir="./openhands_cache",
    max_cache_size="10g",
    ttl_seconds=3600  # 1 hour cache TTL
)

# Cache will automatically store and reuse:
# - Model responses for similar queries
# - Compiled code artifacts
# - Dependency installations
# - Test execution results

Summary

In this chapter, we've covered:

  • Installation and Setup - Getting OpenHands running with proper configuration
  • Core Architecture - Understanding agents, runtimes, and task execution
  • Basic Task Execution - Running your first autonomous coding tasks
  • Workspace Management - Working with project directories and files
  • Security and Permissions - Safe execution in sandboxed environments
  • Performance Optimization - Configuring for optimal resource usage

OpenHands represents a significant advancement in AI-assisted software development, capable of handling complex, multi-step coding tasks autonomously.

Key Takeaways

  1. Autonomous Execution: OpenHands can complete entire development workflows independently
  2. Secure Sandboxing: Code execution happens in isolated, secure environments
  3. Flexible Configuration: Adaptable to different project requirements and constraints
  4. Comprehensive Results: Detailed output including code, execution results, and metadata
  5. Production Ready: Configurable security, performance, and resource management

Next, we'll explore basic operations - file manipulation, command execution, and environment management.


Ready for the next chapter? Chapter 2: Basic Operations

Generated for Awesome Code Docs

Depth Expansion Playbook

How These Components Connect

flowchart TD
    A[Docker Container] --> B[OpenHands Runtime]
    B --> C[Agent Controller]
    C --> D[LLM Provider]
    C --> E[Sandbox Executor]
    E --> F[File Operations]
    E --> G[Shell Commands]
    E --> H[Browser Automation]
    D --> I[Plan and Actions]
    I --> E
Loading