Weekly Live on Youtube about AI for Absolute Beginner
AI For Absolute Beginner - Weekly Live:
Build an AI Image Agent - Workshop
Introduction
In this tutorial, we'll explore how to build an AI image editing agent using function calling. We'll use a practical example of converting a horse image into a cow to demonstrate the implementation. By the end of this tutorial, you'll understand:
- How to structure an AI agent with custom tools
- How to implement function calling for image editing tasks
- How to integrate with external image editing APIs
Architecture Overview
Our image editing system consists of two primary tools:
- Mask Generation Tool: Identifies and creates masks for areas that need editing
- Image Editor Tool: Performs the actual image modifications using the masks and instructions
API Call:
Example: ChatGPT Image Editing:
Tools:
- generate mask
- image edit tools
System Flow
Implementation Guide
1. Setting Up the Agent Structure
First, let's create our AI agent with the necessary tools:
class image_agent:
def __init__(self):
# pseudocode
tools.add(edit_image)
tools.add(mask_generation)
def run(self, query: str):
# Pseudocode
# Initialize the agent
Image_Agent = AI_Agent(
model= 'AI_Model',
instructions='You are an image edit assistant.'
tools=self.tools,
name='image_editing_agent',
)
# reasoning loop
for _ in range(max_turns):
...
if __name__ == "__main__":
agent = image_agent()
query = "Based on the 'original_image.png', replace the horse with a dairy cow standing on the grass."
result = agent.run(query)
print(f"Response from LLM: {result}")
2. Function Calling Implementation
Function calling enables our AI agent to transform natural language requests into structured operations that our system can process. Our implementation uses two primary tools:
Tool A: Mask Generation
- Creates masks for areas requiring editsTool B: Image Editor
- Handles the actual image modifications
Let's focus on the Image Editor tool
, which requires 3 key components:
- The original image to be modified
- A mask image defining the editable areas
- Specific editing instructions
Here's how the process works:
With Prompt:
`Based on the 'original_image.png',
replace the horse with a dairy cow
standing on the grass.`
AI Agent will first use
mask generation tool
it will generate a mask file with mask.png
then structures this request into a standardized format for the image editor tool
Here is JSON version of function calling of image editor tool
:
{
"type":"function",
"function":{
"parameters":{
"title":"edit_image",
"type":"object",
"properties":{
"original_image_path":{
"title":"Original Image Path",
"description":"The file path of the original image to be edited."
},
"mask_image_path":{
"title":"Mask Image Path",
"description":"The file path of the mask image, where transparent areas\nindicate regions to be edited."
},
"description":{
"title":"Description",
"description":"A text description of the desired edit to be applied to the image."
}
}
},
"name":"edit_image",
"description":"This function takes an original image, a mask image, and a description to edit\nthe original image based on the mask and the provided description."
}
}
This will translate unstructured data to structured data so that we can use it to make API call:
{
"original_image_path": "original_image.png",
"mask_image_path": "mask.png",
"description": "Replace the horse with a dairy cow standing on grass"
}
Finally, this structured data is used to make the API call for image editing:
response = client.images.edit(
...
image=open(original_image_path, "rb"),
mask=open(mask_image_path, "rb"),
prompt=description
)
3. Implementing the Tools
Mask Generation Tool
This tool analyzes the original image and creates a mask for the areas to be edited:
def generate_mask(image_path, target_object):
# Mask generation logic
...
Image Editor Tool
The image editor tool handles the actual image modification using external APIs:
def edit_image(original_image_path, mask_image_path, description):
"""
Edit an image using external API integration.
Args:
original_image_path (str): Path to the original image
mask_image_path (str): Path to the mask image
description (str): Editing instructions
Returns:
str: Path to the edited image
"""
response = client.images.edit(
image=open(original_image_path, "rb"),
mask=open(mask_image_path, "rb"),
prompt=description,
size="1024x1024"
)
return save_edited_image(response)
4. Complete Workflow
Let's walk through a complete example:
- User Input: "Change the horse in input.png to a cow"
- Agent processes the request and generates a mask
- Image editor applies the transformation using the mask and description
# Example usage
query = "Change the horse to a cow"
result = agent.run(query)
# Result: edited_image.png with horse replaced by cow
Repo:
github.com/yefan/image-agent-workshop
Receive More Content Like This
Sign up newsletter to receive weekly career tips to thrive up in tech ๐งก