AI Study Events
๐ŸŒŸ

AI Study Events

Text

Weekly Live on Youtube about AI for Absolute Beginner

Service
Workshop
Previously Delivered In
ThriveUpUBCSFU
Featured

AI For Absolute Beginner - Weekly Live:

Build an AI Image Agent - Workshop

image

Introduction

In this tutorial, we'll explore how to build an AI image editing agent using function calling. We'll use a practical example of converting a horse image into a cow to demonstrate the implementation. By the end of this tutorial, you'll understand:

  • How to structure an AI agent with custom tools
  • How to implement function calling for image editing tasks
  • How to integrate with external image editing APIs

Architecture Overview

Our image editing system consists of two primary tools:

  • Mask Generation Tool: Identifies and creates masks for areas that need editing
  • Image Editor Tool: Performs the actual image modifications using the masks and instructions

API Call:

Example: ChatGPT Image Editing:

image

Tools:

  • generate mask
  • image edit tools
image

System Flow

Implementation Guide

1. Setting Up the Agent Structure

First, let's create our AI agent with the necessary tools:

class image_agent:
    def __init__(self):
        # pseudocode
				tools.add(edit_image)
				tools.add(mask_generation)
				
    def run(self, query: str):
        # Pseudocode
        # Initialize the agent
				Image_Agent = AI_Agent(
					model= 'AI_Model',
					instructions='You are an image edit assistant.'
					tools=self.tools,
					name='image_editing_agent',
					)

        # reasoning loop
        for _ in range(max_turns):
           ...
           
if __name__ == "__main__":
    agent = image_agent()
    query = "Based on the 'original_image.png', replace the horse with a dairy cow standing on the grass."
    result = agent.run(query)
    print(f"Response from LLM: {result}")

2. Function Calling Implementation

Function calling enables our AI agent to transform natural language requests into structured operations that our system can process. Our implementation uses two primary tools:

  • Tool A: Mask Generation - Creates masks for areas requiring edits
  • Tool B: Image Editor - Handles the actual image modifications

Let's focus on the Image Editor tool, which requires 3 key components:

  • The original image to be modified
  • A mask image defining the editable areas
  • Specific editing instructions

Here's how the process works:

image

With Prompt:

`Based on the 'original_image.png', 
replace the horse with a dairy cow 
standing on the grass.`

AI Agent will first use

mask generation tool

it will generate a mask file with mask.png then structures this request into a standardized format for the image editor tool

Here is JSON version of function calling of image editor tool:

{
   "type":"function",
   "function":{
      "parameters":{
         "title":"edit_image",
         "type":"object",
         "properties":{
            "original_image_path":{
               "title":"Original Image Path",
               "description":"The file path of the original image to be edited."
            },
            "mask_image_path":{
               "title":"Mask Image Path",
               "description":"The file path of the mask image, where transparent areas\nindicate regions to be edited."
            },
            "description":{
               "title":"Description",
               "description":"A text description of the desired edit to be applied to the image."
            }
         }
      },
      "name":"edit_image",
      "description":"This function takes an original image, a mask image, and a description to edit\nthe original image based on the mask and the provided description."
   }
}

This will translate unstructured data to structured data so that we can use it to make API call:

{
   "original_image_path": "original_image.png",
   "mask_image_path": "mask.png",
   "description": "Replace the horse with a dairy cow standing on grass"
}

Finally, this structured data is used to make the API call for image editing:

response = client.images.edit(
...
    image=open(original_image_path, "rb"),
    mask=open(mask_image_path, "rb"),
    prompt=description
)

3. Implementing the Tools

Mask Generation Tool

This tool analyzes the original image and creates a mask for the areas to be edited:

def generate_mask(image_path, target_object):
    # Mask generation logic
    ...

Image Editor Tool

The image editor tool handles the actual image modification using external APIs:

def edit_image(original_image_path, mask_image_path, description):
    """
    Edit an image using external API integration.
    
    Args:
        original_image_path (str): Path to the original image
        mask_image_path (str): Path to the mask image
        description (str): Editing instructions
        
    Returns:
        str: Path to the edited image
    """
    response = client.images.edit(
        image=open(original_image_path, "rb"),
        mask=open(mask_image_path, "rb"),
        prompt=description,
        size="1024x1024"
    )
    return save_edited_image(response)

4. Complete Workflow

Let's walk through a complete example:

  1. User Input: "Change the horse in input.png to a cow"
  2. Agent processes the request and generates a mask
  3. Image editor applies the transformation using the mask and description
# Example usage
query = "Change the horse to a cow"
result = agent.run(query)
# Result: edited_image.png with horse replaced by cow

Repo:

github.com/yefan/image-agent-workshop

Receive More Content Like This

Sign up newsletter to receive weekly career tips to thrive up in tech ๐Ÿงก