Data Storage and Management

Effective data storage and management are crucial to ensuring that Cleo agents can retain state across sessions, execute tasks with proper context, and scale efficiently. Cleo supports a variety of data storage options, from simple file-based systems to more complex distributed databases, enabling flexible storage solutions to meet the needs of your agents.

This guide covers data storage approaches for Cleo, including configuration for local files, cloud-based storage, and best practices for scalability and security.

Data Storage Options

Cleo supports multiple data storage methods, depending on the complexity of the application, volume of data, and persistence requirements.

1. Local File Storage

For simple or small-scale applications, using local file storage can be effective for storing agent states, task logs, and results. Cleo can serialize and save data using common file formats such as JSON, YAML, or SQLite.

Example: Saving Data as JSON

pythonCopyEditimport json

def save_agent_state(agent_id, state):
    with open(f"agent_{agent_id}_state.json", 'w') as f:
        json.dump(state, f)
        
def load_agent_state(agent_id):
    with open(f"agent_{agent_id}_state.json", 'r') as f:
        state = json.load(f)
    return state

In this example, agent states are saved and loaded as JSON files. This method is simple but can become inefficient for large datasets.

2. Database Storage

For more complex applications that require scalability and better data management, using a database system (SQL or NoSQL) is recommended. Cleo can integrate with various databases such as PostgreSQL, MySQL, or MongoDB.

Example: Storing Data in MongoDB

pythonCopyEditfrom pymongo import MongoClient

client = MongoClient("mongodb://localhost:27017/")
db = client.cleo_db
agents_collection = db.agents

def save_agent_to_db(agent_id, state):
    agent_data = {
        "agent_id": agent_id,
        "state": state
    }
    agents_collection.update_one({"agent_id": agent_id}, {"$set": agent_data}, upsert=True)
    
def get_agent_from_db(agent_id):
    agent_data = agents_collection.find_one({"agent_id": agent_id})
    return agent_data['state'] if agent_data else None

This example demonstrates saving and retrieving agent state from a MongoDB database, allowing for easy scalability and more efficient data queries compared to file-based storage.

3. Cloud-Based Storage

For enterprise-grade applications or distributed systems, cloud storage solutions such as Amazon S3, Google Cloud Storage, or Azure Blob Storage can provide the scalability and reliability required to manage large volumes of data.

Example: Using AWS S3 for Data Storage

pythonCopyEditimport boto3

s3 = boto3.client('s3')

def upload_agent_state_to_s3(agent_id, state):
    bucket_name = "cleo-agent-data"
    file_name = f"agent_{agent_id}_state.json"
    s3.put_object(Bucket=bucket_name, Key=file_name, Body=json.dumps(state))

def download_agent_state_from_s3(agent_id):
    bucket_name = "cleo-agent-data"
    file_name = f"agent_{agent_id}_state.json"
    response = s3.get_object(Bucket=bucket_name, Key=file_name)
    state = json.loads(response['Body'].read().decode('utf-8'))
    return state

This example demonstrates how to store and retrieve agent states using Amazon S3, providing a scalable solution for cloud-based data storage.

Managing Agent States

Cleo agents rely on the ability to maintain state across different tasks and interactions. Proper management of agent states is key to ensuring that agents behave consistently and can resume operations even after reboots or restarts.

1. State Persistence

Agent state can include various types of data, such as:

Task completion status
Configuration settings
Historical data
Current context (e.g., environment variables, session data)

You can manage this state in Cleo by storing it in databases or files, and retrieving it whenever the agent needs to resume. The following example shows how to persist an agent's last completed task:

pythonCopyEditdef save_last_task(agent_id, task_name):
    state = load_agent_state(agent_id)
    state['last_task'] = task_name
    save_agent_state(agent_id, state)

def get_last_task(agent_id):
    state = load_agent_state(agent_id)
    return state.get('last_task', None)

By saving the last task to the agent’s state, Cleo ensures continuity even if the agent is restarted or interrupted.

2. Data Cleanup

To maintain optimal performance and prevent excessive storage use, periodic data cleanup is essential. This could involve:

Removing outdated or irrelevant agent data.
Archiving old task logs.
Setting up expiration times for temporary data (e.g., using TTL with Redis).

For example, you can implement a cleanup routine in your application to purge data older than a certain threshold:

pythonCopyEditimport time

def cleanup_old_data(agent_id, threshold):
    state = load_agent_state(agent_id)
    last_updated = state.get('last_updated', 0)
    
    if time.time() - last_updated > threshold:
        # Perform cleanup (e.g., remove outdated data)
        state['old_data'] = None
        save_agent_state(agent_id, state)

In this example, the agent's data is cleaned up if the last update was older than a specified threshold.

Backup and Recovery

For mission-critical systems, data backup and recovery are essential to avoid data loss. Cleo supports integration with cloud backup solutions and version control systems to ensure data is safe.

1. Backup to Cloud Storage

You can schedule automatic backups of your agent data to cloud storage services, such as AWS S3, Google Cloud Storage, or Azure Blob Storage, ensuring your data is regularly backed up and easy to recover.

pythonCopyEditimport shutil

def backup_agent_data(agent_id):
    # Create a backup of the agent state
    shutil.copy(f"agent_{agent_id}_state.json", f"/backup/agent_{agent_id}_state_{time.time()}.json")

This example demonstrates a simple backup procedure using shutil to copy files.

2. Version Control

For more complex data models or configurations, consider using a version control system (e.g., Git) to track changes in agent state or configuration. This allows you to roll back to previous versions if necessary.

Security and Privacy Considerations

When handling sensitive data, it’s important to follow best practices for security and privacy. Some strategies include:

Encryption: Encrypt sensitive data both in transit (e.g., using HTTPS) and at rest (e.g., using AES encryption).
Access Control: Implement role-based access control (RBAC) to restrict access to sensitive data.
Data Anonymization: For privacy-sensitive data, anonymize or hash personally identifiable information (PII) before storing it.

Summary

Data storage and management are fundamental to Cleo’s ability to operate effectively and scale. By using file-based systems for small-scale applications or databases and cloud solutions for large-scale systems, Cleo can store and retrieve agent states efficiently. Additionally, proper data cleanup, backup, and security measures ensure data integrity and privacy.

PreviousIntegration with External APIs NextTask Scheduling and Execution

Last updated 1 month ago

pythonCopyEditimport json def save_agent_state(agent_id, state): with open(f"agent_{agent_id}_state.json", 'w') as f: json.dump(state, f) def load_agent_state(agent_id): with open(f"agent_{agent_id}_state.json", 'r') as f: state = json.load(f) return state

pythonCopyEditfrom pymongo import MongoClient client = MongoClient("mongodb://localhost:27017/") db = client.cleo_db agents_collection = db.agents def save_agent_to_db(agent_id, state): agent_data = { "agent_id": agent_id, "state": state } agents_collection.update_one({"agent_id": agent_id}, {"$set": agent_data}, upsert=True) def get_agent_from_db(agent_id): agent_data = agents_collection.find_one({"agent_id": agent_id}) return agent_data['state'] if agent_data else None

pythonCopyEditimport boto3 s3 = boto3.client('s3') def upload_agent_state_to_s3(agent_id, state): bucket_name = "cleo-agent-data" file_name = f"agent_{agent_id}_state.json" s3.put_object(Bucket=bucket_name, Key=file_name, Body=json.dumps(state)) def download_agent_state_from_s3(agent_id): bucket_name = "cleo-agent-data" file_name = f"agent_{agent_id}_state.json" response = s3.get_object(Bucket=bucket_name, Key=file_name) state = json.loads(response['Body'].read().decode('utf-8')) return state

pythonCopyEditdef save_last_task(agent_id, task_name): state = load_agent_state(agent_id) state['last_task'] = task_name save_agent_state(agent_id, state) def get_last_task(agent_id): state = load_agent_state(agent_id) return state.get('last_task', None)

pythonCopyEditimport time def cleanup_old_data(agent_id, threshold): state = load_agent_state(agent_id) last_updated = state.get('last_updated', 0) if time.time() - last_updated > threshold: # Perform cleanup (e.g., remove outdated data) state['old_data'] = None save_agent_state(agent_id, state)

pythonCopyEditimport shutil def backup_agent_data(agent_id): # Create a backup of the agent state shutil.copy(f"agent_{agent_id}_state.json", f"/backup/agent_{agent_id}_state_{time.time()}.json")