Code Execution in Kestra’s AI Agents Powered by Judge0

Cover Image

AI agents enable new workflow orchestration patterns. With the release of Kestra 1.0, teams can now run autonomous AI tasks that combine large language models (LLMs), memory, and external tools to dynamically decide which steps to take to accomplish a given goal. Among the tools available to these agents is code execution, integrated through the Judge0.

Code execution tool lets AI Agents run LLM-generated code to perform mathematical calculations or analyze data in a safe orchestration environment.

#Why Code Execution Matters for AI Agents

LLMs excel at pattern recognition, explanation, and producing step-by-step solutions, but they are not built for executing precise deterministic calculations. An AI agent without tools may hallucinate numbers, miscount items in a log, or generate invalid cryptographic hashes.

In Kestra, adding the code execution tool gives agents a way to:

Run exact mathematical or statistical computations
Parse and aggregate structured data
Perform simulations
Execute and validate code in multiple programming languages

By combining LLM reasoning with Judge0 execution, Kestra agents can ensure deterministic results.

#Minimal Example

Here’s a minimal example where an AI Agent uses Judge0 to compute the square root of a number:

id: calculator_agent
namespace: company.ai
 
inputs:
  - id: nr
    type: INT
    defaults: 1764
 
tasks:
  - id: agent
    type: io.kestra.plugin.ai.agent.AIAgent
    provider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      apiKey: "{{kv('GEMINI_API_KEY')}}"
      modelName: gemini-2.5-flash
    prompt: What is the square root of {{inputs.nr}}?
    tools:
      - type: io.kestra.plugin.ai.tool.CodeExecution
        apiKey: "{{kv('RAPID_API_KEY')}}"

Without the CodeExecution tool, modern LLMs might still return the correct number. But with more complex use cases, e.g. a mortgage amortization calculation, relying on the LLM alone can produce an invalid output. Judge0 ensures the math is executed correctly.

#Expanding Beyond Math

Through CodeExecution, Kestra agents can handle tasks where LLMs alone often fail:

Data transformation: parse logs, count events, or aggregate JSON
Cryptography: compute real hashes like SHA-256
Validation: run code snippets validating results before calling downstream services.

Take cryptography as an example. LLMs can describe how hashing works but cannot reliably produce the correct digest. Here’s a flow that compares an LLM-only attempt with a Judge0-powered execution:

id: sha256_comparison
namespace: company.ai
 
inputs:
  - id: text
    type: STRING
    defaults: LLMs cannot reliably compute SHA-256
 
tasks:
  - id: llm_hash
    type: io.kestra.plugin.ai.agent.AIAgent
    description: ❌ LLM-only attempt (likely to hallucinate a fake hash)
    provider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      apiKey: "{{ kv('GEMINI_API_KEY') }}"
      modelName: gemini-2.5-flash
    systemMessage: |
      Compute the SHA-256 hash.
      Return only {"input": "<string>", "sha256": "<hash>"}.
    prompt: Compute the SHA-256 hash of "{{ inputs.text }}"
 
  - id: judge0_hash
    type: io.kestra.plugin.ai.agent.AIAgent
    description: ✅ Judge0-powered hash using Node's crypto (reliable)
    provider:
      type: io.kestra.plugin.ai.provider.GoogleGemini
      apiKey: "{{ kv('GEMINI_API_KEY') }}"
      modelName: gemini-2.5-flash
    systemMessage: Always call the CodeExecution tool.
    prompt: |
      Compute the SHA-256 hash of "{{ inputs.text }}" by running code
      using Node's `crypto` library.
      Return only {"input": "<string>", "sha256": "<hex>"}.
    tools:
      - type: io.kestra.plugin.ai.tool.CodeExecution
        apiKey: "{{ kv('RAPID_API_KEY') }}"

Running this flow shows the difference immediately:

LLM-only (llm_hash): returns a plausible but incorrect 64-character hex string.
CodeExecution (judge0_hash): executes real JavaScript with Node’s crypto library and produces a correct, deterministic hash every time.

This illustrates why code execution is an essential tool for Kestra’s agents.

Example

#Declarative Orchestration with AI

AI Agents in Kestra are declarative: you describe what you want, and the agent figures out how to achieve it.

Code execution with Judge0 plays an important role in this setup: it ensures correctness, determinism, and reproducibility in AI workflows.

Together, they enable workflows that adapt in real time while remaining safe and observable.

#Get Started

The integration of Kestra AI Agents with Judge0 reflects a broader shift: orchestration systems are becoming a fusion of reasoning (LLMs) and execution (tools). As teams build more adaptive workflows, having access to both becomes critical.

👉 Try Kestra 1.0 with AI Agents: https://kestra.io/1-0

👉 Explore Judge0: https://judge0.com