Griptape Cloud Image Generation & Improved Structured Output with Griptape Framework 1.6

Griptape Framework 1.6 is now available. Continuing the theme of delivering enhancements to the framework’s multimodal support, the 1.6 release adds support for image generation with Griptape Cloud with a new GriptapeCloudImageGenerationDriver. This release also includes further enhancements to structured output with a new output schema validation subtask and parameterization for event type streaming with the Stream utility. Let’s jump in and take a quick look at these new features.

Image Generation with Griptape Cloud

With the addition of the GriptapeCloudImageGenerationDriver developers can now generate images with Griptape Framework with only a Griptape Cloud API key. You do not need to provide API credentials for any other model provider. In this first release we support the dall-e-3 image generation model.

Let’s try this out with the code sample below:

import os
from io import BytesIO
from PIL import Image

from griptape.drivers.image_generation.griptape_cloud import (
   GriptapeCloudImageGenerationDriver,
)

driver = GriptapeCloudImageGenerationDriver(
   api_key=os.environ["GT_CLOUD_API_KEY"],
   model="dall-e-3",
)

image = driver.run_text_to_image(
   [
       """create an image of two billboards in a bustling downtown scene with skyscrapers.
       There are two billboards. one has the exact phrase "Griptape Cloud" and the other has the exact phrase "Image Generation" and uses black and beige, photo"""
   ]
)

Image.open(BytesIO(image.value)).show()

You can see in the sample that to generate an image I create a driver instance, providing my Griptape Cloud API key and specifying the model that I want to use. I then call the run_text_to_image method on the driver, providing my prompt as an argument. This returns an image object with the image stored in the value attribute. Lastly, I display the image using PIL.Image.show()

I like the results that I got here. The text is clear and legible in the image. 

If you’d like to try this out, you can sign up for a Griptape Cloud account here. Once you’ve done that, create an API key by selecting the API Keys option in the left navigation menu and then the Create API Key button in the top right of the UI. Save the key into your development environment and you’re ready to start experimenting with image generation on Griptape Cloud.

Structured Output Validation

This release brings further improvements to the structured output feature that we introduced in version 1.2. Structured outputs will now be validated with a new OutputSchemaValidationSubtask (what a mouthful). This subtask will validate a model's output against the provided output schema, re-prompting the LLM in-case of validation errors. The subtask will be retried until the correct structured output schema is output, respecting the max_subtasks parameter that is set on the subtask to avoid endless retries and limiting costs.

This is illustrated in the following example output. Here I have created an email summarizing application, similar to the example that I used in the recent post about the Griptape Zapier Integration. I use the same rules as I used in that example, and provided the plain text email that I want to summarize and extract key information from in the prompt. I also specify the following structured output schema and I passed this to my PromptTask using the output_schema keyword argument. 

from pydantic import BaseModel

class DateItem(BaseModel):
   explanation: str
   date: str

class Sender(BaseModel):
   name: str
   email: str

class Output(BaseModel):
   Sender: Sender
   KeyDates: list[DateItem]
   Summary: str
   NeedToTakeAction: bool
   NeedToRespond: bool

On my test run, I got the following output

In the log output from this run you can see that validation failed at the first attempt because the case in the attribute names was incorrect (highlighted in the yellow dotted box). The subtask is then rerun with an updated prompt. The second execution matches the output schema correctly, with the output attribute names capitalized as NeedToTakeAction and NeedToRespond as specified the BaseModel. The Output object is then generated.  

Event type selection with Stream Utility

The last new feature in this release is the addition of an event_types parameter to the Stream Utility. Setting this parameter to a list of event types allows you to control which event types the Stream Utility will return, making it simple to filter events from Structures - for example if you only want to receive events of type TextChunkEvent.

How to get started with Griptape Framework

Griptape Framework 1.6 is available now. As usual, you can download it with uv, poetry, pip or another Python package manager of your choice. We would love to hear your feedback on Griptape Framework and to hear your ideas and suggestions for future enhancements. If you have feedback, or need help getting started please head over to the Griptape Discord, where you will find our team ready to listen and help.