Text

Weekly Live on Youtube about AI for Absolute Beginner

Service

Workshop

Previously Delivered In

ThriveUpUBCSFU

Featured

AI For Absolute Beginner - Weekly Live:

RAG Basic:

AI-for-Absolute-Beginners - RAG.pdf2464.7KB

Build an AI Image Agent - Workshop

Introduction

In this tutorial, we'll explore how to build an AI image editing agent using function calling. We'll use a practical example of converting a horse image into a cow to demonstrate the implementation. By the end of this tutorial, you'll understand:

How to structure an AI agent with custom tools
How to implement function calling for image editing tasks
How to integrate with external image editing APIs

Architecture Overview

Our image editing system consists of two primary tools:

Mask Generation Tool: Identifies and creates masks for areas that need editing
Image Editor Tool: Performs the actual image modifications using the masks and instructions

AI For Absolute Beginner - Weekly Live:
Build an AI Image Agent - Workshop
Introduction
API Call:
Tools:
System Flow
Implementation Guide
1. Setting Up the Agent Structure
2. Function Calling Implementation
3. Implementing the Tools
Mask Generation Tool
Image Editor Tool
4. Complete Workflow
Repo:

Author: Annie Wang

API Call:

Example: ChatGPT Image Editing:

Tools:

generate mask
image edit tools

System Flow

Implementation Guide

1. Setting Up the Agent Structure

First, let's create our AI agent with the necessary tools:

class image_agent:
    def __init__(self):
        # pseudocode
				tools.add(edit_image)
				tools.add(mask_generation)
				
    def run(self, query: str):
        # Pseudocode
        # Initialize the agent
				Image_Agent = AI_Agent(
					model= 'AI_Model',
					instructions='You are an image edit assistant.'
					tools=self.tools,
					name='image_editing_agent',
					)

        # reasoning loop
        for _ in range(max_turns):
           ...
           
if __name__ == "__main__":
    agent = image_agent()
    query = "Based on the 'original_image.png', replace the horse with a dairy cow standing on the grass."
    result = agent.run(query)
    print(f"Response from LLM: {result}")

2. Function Calling Implementation

Function calling enables our AI agent to transform natural language requests into structured operations that our system can process. Our implementation uses two primary tools:

Tool A: Mask Generation - Creates masks for areas requiring edits
Tool B: Image Editor - Handles the actual image modifications

Let's focus on the Image Editor tool, which requires 3 key components:

The original image to be modified
A mask image defining the editable areas
Specific editing instructions

Here's how the process works:

With Prompt:

`Based on the 'original_image.png', 
replace the horse with a dairy cow 
standing on the grass.`

AI Agent will first use

mask generation tool

it will generate a mask file with mask.png then structures this request into a standardized format for the image editor tool

Here is JSON version of function calling of image editor tool:

{
   "type":"function",
   "function":{
      "parameters":{
         "title":"edit_image",
         "type":"object",
         "properties":{
            "original_image_path":{
               "title":"Original Image Path",
               "description":"The file path of the original image to be edited."
            },
            "mask_image_path":{
               "title":"Mask Image Path",
               "description":"The file path of the mask image, where transparent areas\nindicate regions to be edited."
            },
            "description":{
               "title":"Description",
               "description":"A text description of the desired edit to be applied to the image."
            }
         }
      },
      "name":"edit_image",
      "description":"This function takes an original image, a mask image, and a description to edit\nthe original image based on the mask and the provided description."
   }
}

This will translate unstructured data to structured data so that we can use it to make API call:

{
   "original_image_path": "original_image.png",
   "mask_image_path": "mask.png",
   "description": "Replace the horse with a dairy cow standing on grass"
}

Finally, this structured data is used to make the API call for image editing:

response = client.images.edit(
...
    image=open(original_image_path, "rb"),
    mask=open(mask_image_path, "rb"),
    prompt=description
)

3. Implementing the Tools

Mask Generation Tool

This tool analyzes the original image and creates a mask for the areas to be edited:

def generate_mask(image_path, target_object):
    # Mask generation logic
    ...

Image Editor Tool

The image editor tool handles the actual image modification using external APIs:

def edit_image(original_image_path, mask_image_path, description):
    """
    Edit an image using external API integration.
    
    Args:
        original_image_path (str): Path to the original image
        mask_image_path (str): Path to the mask image
        description (str): Editing instructions
        
    Returns:
        str: Path to the edited image
    """
    response = client.images.edit(
        image=open(original_image_path, "rb"),
        mask=open(mask_image_path, "rb"),
        prompt=description,
        size="1024x1024"
    )
    return save_edited_image(response)

4. Complete Workflow

Let's walk through a complete example:

User Input: "Change the horse in input.png to a cow"
Agent processes the request and generates a mask
Image editor applies the transformation using the mask and description

# Example usage
query = "Change the horse to a cow"
result = agent.run(query)
# Result: edited_image.png with horse replaced by cow

Repo:

github.com/yefan/image-agent-workshop

AI Study Events