Griptape v0.23: Unveiling Enhanced Configuration and Image Capabilities

Last week we released Griptape v0.23, bringing a series of substantial changes to the framework. In this update, we have focused on enhancing configuration management and introducing new image processing capabilities. Let’s delve into the highlights.

Breaking Changes

Indexes

In order to maintain consistency across Drivers, the create_index method has been removed from the following:

MarqoVectorStoreDriver
OpenSearchVectorStoreDriver
PineconeVectorStoreDriver
RedisVectorStoreDriver

We have provided recommendations on how these indexes can be created in our docs.

Additionally, we have made theindex_fielda required field in MongoDbAtlasVectorStoreDriver, aligning the interface more closely with similar Drivers.

Image Loading

The interface of ImageLoader().load() has been updated to take source: bytes instead of the previous path: str | Path. This change allows for in-memory image processing, which can be quite useful when chaining multiple image Tasks in a Pipeline or Workflow.

If you still need to load an image from a file, you can open the file for reading bytes:

‍

[code]

from griptape.loaders import ImageLoader

loader = ImageLoader()
path = "assets/mountain.png"
with open(path, "rb") as file:
artifact = loader.load(file.read())

print(artifact)

[/code]

Configuration Enhancements

v0.23 introduces a simplified Structure configuration interface. These substantial changes to how configurations are propagated to lower-level components mitigate errors from hidden defaults, allow for premade configurations, and enable overrides of advanced Griptape features. This is done through a new abstraction, StructureConfig, which provides a dedicated place for introducing new functionality without cluttering the Structure class with one-off fields.

In previous versions of Griptape, overriding the prompt_driver field in a Structure was done like this:

[code]

from griptape.structures import Agent
from griptape.drivers import OpenAiChatPromptDriver

agent = Agent(prompt_driver=OpenAiChatPromptDriver(model="gpt-3.5-turbo"))

[/code]

The new syntax for making the same change is:

‍

[code]

from griptape.structures import Agent
from griptape.drivers import OpenAiChatPromptDriver
from griptape.config import StructureConfig, StructureGlobalDriversConfig

agent = Agent(
config=StructureConfig(
global_drivers=StructureGlobalDriversConfig(
prompt_driver=OpenAiChatPromptDriver(model="gpt-3.5-turbo")
)
)
)

[/code]

Simplified Structures

Now that we have a singular place for Drivers, Tasks like the new JsonExtractionTask can look to their Structure's config without the need for the user to initialize a JsonExtractionEngine .

[code]

from griptape.structures import Agent
from griptape.tasks import JsonExtractionTask
from griptape.config import OpenAiStructureConfig
from schema import Schema

json_data = """
Alice (Age 28) lives in New York.
Bob (Age 35) lives in California.
"""
user_schema = Schema(
{"users": [{"name": str, "age": int, "location": str}]}
).json_schema("UserSchema")

agent = Agent(
config=OpenAiStructureConfig(),
tasks=[
JsonExtractionTask(
args={"template_schema": user_schema},
)
],
)

agent.run(json_data)

[/code]

‍

Hidden Defaults

A common challenge with Griptape’s previous architecture was that certain Driver defaults were not immediately obvious to users, leading them towards unhelpful error messages. The introduction of StructureConfig alleviates this challenge by providing “Dummy Drivers” that will raise helpful error messages only when a user attempts to use a feature that requires a real Driver to be set.

For instance, if you wanted to update the prompt_driver field to a non-OpenAI Driver, you were very likely to encounter the following error:

[code]

import os
from griptape.structures import Agent
from griptape.tools import WebScraper, TaskMemoryClient
from griptape.drivers import AnthropicPromptDriver

agent = Agent(
prompt_driver=AnthropicPromptDriver(
api_key=os.environ["ANTHROPIC_API_KEY"], model="claude-2.1"
),
tools=[WebScraper(off_prompt=True), TaskMemoryClient(off_prompt=False)],
)

agent.run("Tell me more about https://griptape.ai")

[/code]

‍

[code]

openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

[/code]

‍

Why would users need an OpenAI key when using an Anthropic Driver? This was happening because while users overrode the prompt_driver field, they did not initialize the lesser-known embedding_driver, which was creating an OpenAI client. This client would look for an OpenAI API key in the environment and fail when not found.

When using a StructureConfig, all global_drivers are initialized to a “Dummy Driver” by default. If a user does not override a Dummy Driver but uses some functionality that requires a real Driver to be set, they will be presented with a directed error message rather than a lower-level third-party SDK error.

[code]

import os
from griptape.structures import Agent
from griptape.drivers import AnthropicPromptDriver
from griptape.tools import WebScraper, TaskMemoryClient
from griptape.config import StructureConfig, StructureGlobalDriversConfig

agent = Agent(
config=StructureConfig(
global_drivers=StructureGlobalDriversConfig(
prompt_driver=AnthropicPromptDriver(
model="claude-2.1", api_key=os.environ["ANTHROPIC_API_KEY"]
)
)
),
tools=[WebScraper(off_prompt=True), TaskMemoryClient(off_prompt=False)],
)

agent.run("Tell me more about https://griptape.ai")

[/code]

‍

[code]

DummyException You have attempted to use a DummyEmbeddingDriver's try_embed_chunk method. This likely originated from using a `StructureConfig` without providing a Driver required for this feature.

[/code]

‍

Adding an embedding_driver solves the issue:

[code]

import os
from griptape.structures import Agent
from griptape.drivers import AnthropicPromptDriver, OpenAiEmbeddingDriver
from griptape.tools import WebScraper, TaskMemoryClient
from griptape.config import StructureConfig, StructureGlobalDriversConfig

agent = Agent(
config=StructureConfig(
global_drivers=StructureGlobalDriversConfig(
prompt_driver=AnthropicPromptDriver(
model="claude-2.1", api_key=os.environ["ANTHROPIC_API_KEY"]
),
embedding_driver=OpenAiEmbeddingDriver(),
)
),
tools=[WebScraper(off_prompt=True), TaskMemoryClient(off_prompt=False)],
)

agent.run("Tell me more about https://griptape.ai")

[/code]

‍

Premade Configs

The newly introduced StructureConfig allows for easy creation of pre-built configurations. We’ve launched with OpenAiStructureConfig and AmazonBedrockStructureConfig since both platforms provide functionality that aligns with all the Drivers offered in Griptape.

We will continue to evaluate other platforms and build more pre-built configurations, but we encourage users to create their own configurations to suit their needs. Check out the implementation of OpenAiStructureConfig for an example of how you can create your own.

By default, Structures use OpenAiStructureConfig, but we can easily change to Amazon Bedrock:


from griptape.structures import Agent
from griptape.config import AmazonBedrockStructureConfig

agent = Agent(
	config=AmazonBedrockStructureConfig()
)

You can even mix and match configurations:


from griptape.structures import Agent
from griptape.config import AmazonBedrockStructureConfig
from griptape.drivers import OpenAiEmbeddingDriver

config = AmazonBedrockStructureConfig()
config.global_drivers.embedding_driver = OpenAiEmbeddingDriver()

agent = Agent(config=config)

Or load a configuration from an external config file:


{
    "global_drivers": {
        "prompt_driver": {
            "type": "AmazonBedrockPromptDriver",
            "model": "amazon.titan-text-express-v1",
            "prompt_model_driver": {
                "type": "BedrockTitanPromptModelDriver"
            }
        },
        "embedding_driver": {
            "type": "AmazonBedrockCohereEmbeddingDriver"
        }
    }
}


from griptape.structures import Agent
from griptape.config import StructureConfig


with open("config.json", "r") as f:
    serialized_config = f.read()

    config = StructureConfig.from_json(serialized_config)

    agent = Agent(
        config=config,
    )

Easier Overrides

Finally, StructureConfig allows for easier overrides of some of the more advanced features in the framework. For instance, if users wanted to change which Embedding Driver was used during Task Memory, it previously required overriding the entire Task Memory object, which was no easy feat. Now, users have much more granular control over the overrides:


from griptape.structures import Agent
from griptape.config import (
    StructureConfig,
    StructureTaskMemoryConfig,
    StructureTaskMemoryQueryEngineConfig,
)
from griptape.drivers import LocalVectorStoreDriver, OpenAiEmbeddingDriver


agent = Agent(
    config=StructureConfig(
        task_memory=StructureTaskMemoryConfig(
            query_engine=StructureTaskMemoryQueryEngineConfig(
                vector_store_driver=LocalVectorStoreDriver(
                    embedding_driver=OpenAiEmbeddingDriver(),
                )
            )
        )
    )
)

Future of Config

Hopefully, these examples have sparked your imagination with what you can accomplish with the new configuration features! We have kept the prompt_driver, embedding_driver, and stream fields on the Structure for now, though they have been given a deprecation warning and will be removed in a future release.

Image Capabilities

The Drivers Have Eyes

This release also adds the ability to use OpenAI’s Vision API with the OpenAiVisionImageQueryDriver. You can integrate this Driver into your Structures via the new ImageQueryTask, ImageQueryClient, and ImageQueryEngine.

Here’s how you can use the ImageQueryEngine to describe the contents of the image below:


from griptape.drivers import OpenAiVisionImageQueryDriver
from griptape.engines import ImageQueryEngine
from griptape.loaders import ImageLoader 

engine = ImageQueryEngine(
    image_query_driver=OpenAiVisionImageQueryDriver(
        max_tokens=200,
    ),
)

with open("mountain.png", "rb") as f:
    image_artifact = ImageLoader().load(f.read())

engine.run("Describe the weather in the image", [image_artifact])

The weather in the image depicts a serene and beautiful sunset in a mountainous region. The sun is low in the sky, casting warm hues across the clouds and the landscape. A thick blanket of clouds or fog is nestled in the valleys between the mountains, creating a dramatic and breathtaking effect. The peaks of the mountains are sharp and clear against the sky, suggesting the air is crisp and cool, likely indicative of a high altitude environment. It appears to be a calm and tranquil scene without any noticeable wind or storm activity.

Stay tuned for more Image Query Drivers!

Expanded Support for DallE 2 and Leonardo

With this update, we’ve expanded the functionality of both the OpenAI Dall-E 2 and Leonardo Image Generation Drivers to support additional image generation modes.

Your Griptape projects can now use Leonardo-hosted models to generate images from text prompts in a PromptImageGenerationEngine or to generate image variations with a VariationImageGenerationEngine. The Dall-E 2 Image Generation Driver now supports generating images from text prompts, generating image variations, and editing images using the InpaintingImageGenerationEngine.

This example uses Leonardo to create a new image from our familiar mountain scene:


import os 

from griptape.drivers import LeonardoImageGenerationDriver
from griptape.engines import VariationImageGenerationEngine
from griptape.tasks import VariationImageGenerationTask
from griptape.structures import Pipeline
from griptape.loaders import ImageLoader


driver = LeonardoImageGenerationDriver(
    api_key=os.environ["LEONARDO_API_KEY"],
    # Specify a model or provide an empty string to use the default.
    model="",
    init_strength=0.3,
    steps=30,
    image_width=768,
    image_height=512,
)

engine = VariationImageGenerationEngine(
    image_generation_driver=driver,
)

with open("mountain.png", "rb") as f:
    image_artifact = ImageLoader().load(f.read())

pipeline = Pipeline()

pipeline.add_task(
    VariationImageGenerationTask(
        image_generation_engine=engine,
        output_dir="images/",
        input=("", image_artifact),
    )
)

pipeline.run("instant film camera, slightly blurry image, film grain, vintage photography")

Here’s the result, the input image imbued with the characteristics of a vintage film photograph:

Mountain image after being modified with Leonardo.Ai

Image Tools and Task Memory

Image generation tools now have access to Task Memory, allowing Agents to naturally chain together Image Tools with other Tools. Previously, Tools that depended on Image Artifacts generated or retrieved in previous actions required writing to and reading from disk. This was an awkward intermediate step that resulted in needlessly complex chains-of-thought and worse performance.

With this update, the VariationImageGeneratonClient, InpaintingImageGenerationClient, and OutpaintingImageGenerationClient tools provide activities to both read images from disk and access Image Artifacts already present in Task Memory.

In this example, we’ll ask an Agent to generate an image of a dog, then create a pixel art variation of it:


from griptape.drivers import AmazonBedrockImageGenerationDriver, BedrockStableDiffusionImageGenerationModelDriver
from griptape.engines import PromptImageGenerationEngine, VariationImageGenerationEngine
from griptape.tools import PromptImageGenerationClient, VariationImageGenerationClient
from griptape.structures import Agent


# Initialize a tool to generate images using Stable Diffusion.
image_generation_tool = PromptImageGenerationClient(
    output_file="dog.png",
    engine=PromptImageGenerationEngine(
        image_generation_driver=AmazonBedrockImageGenerationDriver(
            model="stability.stable-diffusion-xl-v1",
            image_generation_model_driver=BedrockStableDiffusionImageGenerationModelDriver(
                style_preset="cinematic",
            ),
        )
    )
)

# Initialize a tool to create image variations using Stable Diffusion.
image_variation_tool = VariationImageGenerationClient(
    output_file="pixel-dog.png",
    engine=VariationImageGenerationEngine(
        image_generation_driver=AmazonBedrockImageGenerationDriver(
            model="stability.stable-diffusion-xl-v1",
            image_generation_model_driver=BedrockStableDiffusionImageGenerationModelDriver(
                style_preset="pixel-art",
            ),
        )
    )
)

# Create an Agent and provide it with both tools.
agent = Agent(tools=[image_generation_tool, image_variation_tool])

# Prompt the Agent to create an image and a variation.
agent.run("Generate a detailed photograph of a happy dog on a sunny day, then create a pixel art variation of the image.")

If we take a look at the Agent’s actions, we can observe that Image Artifacts are stored in and retrieved directly from Task Memory.


[03/15/24 13:29:07] INFO     ToolkitTask 28f652259154402f952f14ee0809f37a
                             Input: Generate a detailed photograph of a happy dog on a sunny day, then create a pixel art variation of the image.
[03/15/24 13:29:11] INFO     Subtask 93a87006404e4d8e98e7a8db08895fa1
                             Thought: First, I need to generate the detailed photograph of a happy dog on a sunny day. I will use the PromptImageGenerationClient action for this.

                             Action:
                             {
                               "name": "PromptImageGenerationClient",
                               "path": "generate_image",
                               "input": {
                                 "values": {
                                   "prompts": ["a detailed photograph of a happy dog on a sunny day"],
                                   "negative_prompts": []
                                 }
                               }
                             }
[03/15/24 13:29:19] INFO     Subtask 93a87006404e4d8e98e7a8db08895fa1
                             Response: Output of "PromptImageGenerationClient.generate_image" was stored in memory with memory_name "TaskMemory" and artifact_namespace "image_artifact_240315132919_ju1u.png"
[03/15/24 13:29:25] INFO     Subtask b66a07d527c34e0095e42f8601012691
                             Thought: Now that the initial image has been generated and stored in memory, I can create a pixel art variation of it. I will use the VariationImageGenerationClient action for this, and I will use the
                             memory_name and artifact_namespace from the previous action's output to specify the image to be varied.
                             Action: {"name": "VariationImageGenerationClient", "path": "image_variation_from_memory", "input": {"values": {"prompts": ["pixel art"], "negative_prompts": [], "memory_name": "TaskMemory",
                             "artifact_namespace": "image_artifact_240315132919_ju1u.png", "artifact_name": "image_artifact_240315132919_ju1u.png"}}}
[03/15/24 13:29:36] INFO     Subtask 7eae946456b347e0a49b2532af4b8782
                             Response: Output of "VariationImageGenerationClient.image_variation_from_memory" was stored in memory with memory_name "TaskMemory" and artifact_namespace "image_artifact_240315132936_yg49.png"
[03/15/24 13:29:38] INFO     ToolkitTask 28f652259154402f952f14ee0809f37a
                             Output: The pixel art variation of the detailed photograph of a happy dog on a sunny day has been successfully generated and stored in memory with the name "TaskMemory" and artifact_namespace
                             "image_artifact_240315132936_yg49.png".

And, of course, here are the two images we generated:

A happy dog on a sunny day, generated by Stable Diffusion

A pixel art version of a happy dog on a sunny day.

‍

Wrapping Up

We hope you enjoy the new features in Griptape v0.23! As always, we are excited to hear your feedback and see what you build with the framework. If you have any questions or need help getting started, please don’t hesitate to reach out to us on Discord.

Griptape Framework v0.23.0

Griptape v0.23: Unveiling Enhanced Configuration and Image Capabilities

Breaking Changes

Indexes

Image Loading

Configuration Enhancements

Simplified Structures

Hidden Defaults

Premade Configs

Easier Overrides

Future of Config

Image Capabilities

The Drivers Have Eyes

Expanded Support for DallE 2 and Leonardo

Image Tools and Task Memory

Wrapping Up