Griptape v0.23: Unveiling Enhanced Configuration and Image Capabilities
Last week we released Griptape v0.23, bringing a series of substantial changes to the framework. In this update, we have focused on enhancing configuration management and introducing new image processing capabilities. Let’s delve into the highlights.
Breaking Changes
Indexes
In order to maintain consistency across Drivers, the create_index
method has been removed from the following:
MarqoVectorStoreDriver
OpenSearchVectorStoreDriver
PineconeVectorStoreDriver
RedisVectorStoreDriver
We have provided recommendations on how these indexes can be created in our docs.
Additionally, we have made theindex_field
a required field in MongoDbAtlasVectorStoreDriver
, aligning the interface more closely with similar Drivers.
Image Loading
The interface of ImageLoader().load()
has been updated to take source: bytes
instead of the previous path: str | Path
. This change allows for in-memory image processing, which can be quite useful when chaining multiple image Tasks in a Pipeline or Workflow.
If you still need to load an image from a file, you can open the file for reading bytes:
[code]
from griptape.loaders import ImageLoader
loader = ImageLoader()
path = "assets/mountain.png"
with open(path, "rb") as file:
artifact = loader.load(file.read())
print(artifact)
[/code]
Configuration Enhancements
v0.23 introduces a simplified Structure configuration interface. These substantial changes to how configurations are propagated to lower-level components mitigate errors from hidden defaults, allow for premade configurations, and enable overrides of advanced Griptape features. This is done through a new abstraction, StructureConfig
, which provides a dedicated place for introducing new functionality without cluttering the Structure
class with one-off fields.
In previous versions of Griptape, overriding the prompt_driver
field in a Structure was done like this:
[code]
from griptape.structures import Agent
from griptape.drivers import OpenAiChatPromptDriver
agent = Agent(prompt_driver=OpenAiChatPromptDriver(model="gpt-3.5-turbo"))
[/code]
The new syntax for making the same change is:
[code]
from griptape.structures import Agent
from griptape.drivers import OpenAiChatPromptDriver
from griptape.config import StructureConfig, StructureGlobalDriversConfig
agent = Agent(
config=StructureConfig(
global_drivers=StructureGlobalDriversConfig(
prompt_driver=OpenAiChatPromptDriver(model="gpt-3.5-turbo")
)
)
)
[/code]
Simplified Structures
Now that we have a singular place for Drivers, Tasks like the new JsonExtractionTask
can look to their Structure's config
without the need for the user to initialize a JsonExtractionEngine
.
[code]
from griptape.structures import Agent
from griptape.tasks import JsonExtractionTask
from griptape.config import OpenAiStructureConfig
from schema import Schema
json_data = """
Alice (Age 28) lives in New York.
Bob (Age 35) lives in California.
"""
user_schema = Schema(
{"users": [{"name": str, "age": int, "location": str}]}
).json_schema("UserSchema")
agent = Agent(
config=OpenAiStructureConfig(),
tasks=[
JsonExtractionTask(
args={"template_schema": user_schema},
)
],
)
agent.run(json_data)
[/code]
Hidden Defaults
A common challenge with Griptape’s previous architecture was that certain Driver defaults were not immediately obvious to users, leading them towards unhelpful error messages. The introduction of StructureConfig
alleviates this challenge by providing “Dummy Drivers” that will raise helpful error messages only when a user attempts to use a feature that requires a real Driver to be set.
For instance, if you wanted to update the prompt_driver
field to a non-OpenAI Driver, you were very likely to encounter the following error:
[code]
import os
from griptape.structures import Agent
from griptape.tools import WebScraper, TaskMemoryClient
from griptape.drivers import AnthropicPromptDriver
agent = Agent(
prompt_driver=AnthropicPromptDriver(
api_key=os.environ["ANTHROPIC_API_KEY"], model="claude-2.1"
),
tools=[WebScraper(off_prompt=True), TaskMemoryClient(off_prompt=False)],
)
agent.run("Tell me more about https://griptape.ai")
[/code]
[code]
openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable
[/code]
Why would users need an OpenAI key when using an Anthropic Driver? This was happening because while users overrode the prompt_driver
field, they did not initialize the lesser-known embedding_driver
, which was creating an OpenAI client. This client would look for an OpenAI API key in the environment and fail when not found.
When using a StructureConfig
, all global_drivers
are initialized to a “Dummy Driver” by default. If a user does not override a Dummy Driver but uses some functionality that requires a real Driver to be set, they will be presented with a directed error message rather than a lower-level third-party SDK error.
[code]
import os
from griptape.structures import Agent
from griptape.drivers import AnthropicPromptDriver
from griptape.tools import WebScraper, TaskMemoryClient
from griptape.config import StructureConfig, StructureGlobalDriversConfig
agent = Agent(
config=StructureConfig(
global_drivers=StructureGlobalDriversConfig(
prompt_driver=AnthropicPromptDriver(
model="claude-2.1", api_key=os.environ["ANTHROPIC_API_KEY"]
)
)
),
tools=[WebScraper(off_prompt=True), TaskMemoryClient(off_prompt=False)],
)
agent.run("Tell me more about https://griptape.ai")
[/code]
[code]
DummyException You have attempted
to
use a DummyEmbeddingDriver
's try_embed_chunk method. This likely originated from using a `StructureConfig` without providing a Driver required for this feature.
[/code]
Adding an embedding_driver
solves the issue:
[code]
import os
from griptape.structures import Agent
from griptape.drivers import AnthropicPromptDriver, OpenAiEmbeddingDriver
from griptape.tools import WebScraper, TaskMemoryClient
from griptape.config import StructureConfig, StructureGlobalDriversConfig
agent = Agent(
config=StructureConfig(
global_drivers=StructureGlobalDriversConfig(
prompt_driver=AnthropicPromptDriver(
model="claude-2.1", api_key=os.environ["ANTHROPIC_API_KEY"]
),
embedding_driver=OpenAiEmbeddingDriver(),
)
),
tools=[WebScraper(off_prompt=True), TaskMemoryClient(off_prompt=False)],
)
agent.run("Tell me more about https://griptape.ai")
[/code]
Premade Configs
The newly introduced StructureConfig
allows for easy creation of pre-built configurations. We’ve launched with OpenAiStructureConfig
and AmazonBedrockStructureConfig
since both platforms provide functionality that aligns with all the Drivers offered in Griptape.
We will continue to evaluate other platforms and build more pre-built configurations, but we encourage users to create their own configurations to suit their needs. Check out the implementation of OpenAiStructureConfig for an example of how you can create your own.
By default, Structures use OpenAiStructureConfig
, but we can easily change to Amazon Bedrock:
You can even mix and match configurations:
Or load a configuration from an external config file:
Easier Overrides
Finally, StructureConfig
allows for easier overrides of some of the more advanced features in the framework. For instance, if users wanted to change which Embedding Driver was used during Task Memory, it previously required overriding the entire Task Memory object, which was no easy feat. Now, users have much more granular control over the overrides:
Future of Config
Hopefully, these examples have sparked your imagination with what you can accomplish with the new configuration features! We have kept the prompt_driver
, embedding_driver
, and stream
fields on the Structure for now, though they have been given a deprecation warning and will be removed in a future release.
Image Capabilities
The Drivers Have Eyes
This release also adds the ability to use OpenAI’s Vision API with the OpenAiVisionImageQueryDriver
. You can integrate this Driver into your Structures via the new ImageQueryTask
, ImageQueryClient
, and ImageQueryEngine
.
Here’s how you can use the ImageQueryEngine
to describe the contents of the image below:
The weather in the image depicts a serene and beautiful sunset in a mountainous region. The sun is low in the sky, casting warm hues across the clouds and the landscape. A thick blanket of clouds or fog is nestled in the valleys between the mountains, creating a dramatic and breathtaking effect. The peaks of the mountains are sharp and clear against the sky, suggesting the air is crisp and cool, likely indicative of a high altitude environment. It appears to be a calm and tranquil scene without any noticeable wind or storm activity.
Stay tuned for more Image Query Drivers!
Expanded Support for DallE 2 and Leonardo
With this update, we’ve expanded the functionality of both the OpenAI Dall-E 2 and Leonardo Image Generation Drivers to support additional image generation modes.
Your Griptape projects can now use Leonardo-hosted models to generate images from text prompts in a PromptImageGenerationEngine
or to generate image variations with a VariationImageGenerationEngine
. The Dall-E 2 Image Generation Driver now supports generating images from text prompts, generating image variations, and editing images using the InpaintingImageGenerationEngine
.
This example uses Leonardo to create a new image from our familiar mountain scene:
Here’s the result, the input image imbued with the characteristics of a vintage film photograph:
Image Tools and Task Memory
Image generation tools now have access to Task Memory, allowing Agents to naturally chain together Image Tools with other Tools. Previously, Tools that depended on Image Artifacts generated or retrieved in previous actions required writing to and reading from disk. This was an awkward intermediate step that resulted in needlessly complex chains-of-thought and worse performance.
With this update, the VariationImageGeneratonClient
, InpaintingImageGenerationClient
, and OutpaintingImageGenerationClient
tools provide activities to both read images from disk and access Image Artifacts already present in Task Memory.
In this example, we’ll ask an Agent to generate an image of a dog, then create a pixel art variation of it:
If we take a look at the Agent’s actions, we can observe that Image Artifacts are stored in and retrieved directly from Task Memory.
And, of course, here are the two images we generated:
Wrapping Up
We hope you enjoy the new features in Griptape v0.23! As always, we are excited to hear your feedback and see what you build with the framework. If you have any questions or need help getting started, please don’t hesitate to reach out to us on Discord.