Breaking Free from the Chat Box

    Challenges of chat interfaces and AI tooling.

    Published on January 1, 2024

    Chat interfaces are now a staple in the software world— I helped pave that road during my 6 years at Drift. We campaigned hard to prove chat wasn’t just for customer service or casual conversations: championing chat over forms with our “no forms campaign”. Fast forward to now, and chat’s more versatile than ever, breathing life into various software and user experiences.

    Enter Chat GPT, Midjourney, and a slew of other chat-first AI tools. They’ve elevated chat from a simple communication tool to the control center for powerful AI capabilities. But its come at a cost: while chat is a solid starting point, it’s also a creative bottleneck. It’s not that it’s the wrong tool for the job—it’s more that our reliance on chat stifles our imagination. We’re so accustomed to interacting with these advanced AI systems through chat that it’s hard to envision other, potentially richer, ways to engage.

    Chat's Limitations in UX and Technical Scope

    While chat’s ubiquitous presence makes it an easy go-to, its simplicity is also its downfall. We're often so accustomed to chat that we overlook opportunities for more engaging user experiences.

    We’re so used to chat that it’s like we’re wearing blinders, focusing only on a narrow pathway of interaction. This leads to a one-size-fits-all approach, where unique opportunities for richer, more engaging user experiences are overlooked. Chat has its own language, a set of rules we’ve internalized so deeply that we forget other languages exist—languages that could offer users far more nuanced interactions.

    The Time-Complexity Dilemma

    Chat’s format struggles with complex, structured data—think of it as the linear time complexity of UX, good for quick operations but lacking when you need to scale the conversation. While dropping in graphs or videos is possible, a series of message boxes doesn’t always cut it. Waiting for a response from GPT-4 could take 30 seconds or more, and in UX, as in algorithms, efficiency matters.

    When you’re working with tools like OpenAI’s Chat API, time can stretch out. A complex prompt may require tens of seconds for a response. Streaming text or markdown alleviates the wait time, a crucial UX improvement. However, the limitation lies in the challenge of merging this real-time benefit with structured JSON or richer data types.

    The streaming approach, while efficient, doesn’t easily support the simultaneous delivery of such data. And there’s the rub: we’ve solved one UX problem but inadvertently narrowed our options for richer, multi-layered interactions.

    Time isn’t just money; it’s also user engagement.

    The Cycle of Constraints and Creative Exploration

    The limitations of existing AI tooling are not mere inconveniences; they set boundaries on what’s possible and, more importantly, what can be easily imagined. While workarounds are possible, they can deter creators from truly exploring the full potential of these technologies.

    In constraining our tools, we may unintentionally be constraining our creative capabilities as well.

    I started to encounter firsthand the challenges in crafting what I envisioned using just the OAI SDK or LangChain. The need for a tool that could efficiently handle complex, structured data and offer more than just string responses was apparent. It wasn't just about managing prompts and state; it was about envisioning an agent instance with a solid identity and a dependable response model, accessible on every request.

    Driven by these needs, I embarked on creating a suite of utilities. This began with a lightweight wrapper around the OAI SDK, allowing the definition of 'agents' and a 'schema agent.' The latter simplified interactions by integrating a Zod schema, managing jsonSchema creation, and handling function calls. This development was not just about streaming text; it was about streaming objects and arrays, accessing these data structures as soon as the stream began.

    Furthermore, to handle the real-time data interaction, I built a library capable of parsing streaming JSON, populating a pre-stubbed data structure as the data arrives. This approach allowed for immediate use of the streamed data, enabling more dynamic and responsive user experiences.

    Finally, I tied these components together with a set of React hooks designed for managing the streaming connection, making requests to endpoints, and utilizing the schema-stream for instant data availability.


    Usage with my current toolkit

    Below is an example usage in a Next.js application - from defining an agent, setting up a route and consuming it all on the client. A working demo u can play with here: Demo - JSON Stream dashboard

    Packages on npm: zod-stream stream-hooks

    built using: schema-stream

    docs for all


    Basic streaming example with zod-stream and stream-hooks

    Defining a schema (response model)

    
    // ./schema.ts (defined in sep file since 
    // we import both into a client component and a server context)
    
    import z from "zod"
    
    export const coreAgentSchema = z.object({
      listOfReasonsWhyJavascriptIsBetterThenPython: z.array(
        .array(
          z.object({
            name: z.string(),
            description: z.string()
          })
        )
      ).min(10),
      listOfReasonsWhyPythonIsBetterThenJavascript: z.array(
        .array(
          z.object({
            name: z.string(),
            description: z.string()
          })
        )
      ).min(1),
      finalSummary: z.string(),
      pointsForPython: z.number()
        .min(0)
        .max(100),
      pointsForJavascript: z.number()
        .min(0)
        .max(100)
    })
    

    Defining an agent

    import { createSchemaAgent } from "@hackdance/agents"
    import { coreAgentSchema } from "./schema"
    
    export const primaryIdentity = `
      You are an ai agent tasked with debating the merits of Python and Javascript. 
    `
    
    export const coreAgent = createAgent({
      client: oai,
      config: {
        messages: [
          {
            role: "system",
            content: indentityPrompt
          }
        ],
        model: "gpt-4-1106-preview",
        temperature: 0.5,
        max_tokens: 500
      },
      response_model: {
        schema: coreAgentSchema,
        name: "structured agent response"
      }
    })
    

    Setting a route to create the completion and stream

    import { coreAgent } from "@/agents/example"
    
    export async function POST(request: Request): Promise<Response> {
      const {
        messages
      } = await request.json()
    
      try {
    
        const stream = await exampleAgent.completionStream({
          messages
        })
        
        return new Response(stream)
    }
    
    

    Using the hooks to consume the stream and start rendering content ASAP

    import { exampleAgent } from "@/ai/agents/example/schema"
    import { useState } from "react"
    
    export function StreamTest() {
      const [result, setResult] = useState({})
    
    const { startStream, stopStream, loading } = useJsonStream({
        schema: coreAgentSchema,
        onReceive: data => {
          setResult(data)
        }
      })
    
    
      const go = async () => {
        try {
          await startStream({
            url: "/api/ai/chat",
            body: {
              messages: [{
                role: "user",
                content: prompt
              }]
            }
          })
        } catch (e) {
          stopStream()
        }
      }
    
      return (
        <div>
          <div>
          {JSON.stringify(result)}
          </div>
          <button onClick={go}>Go</button>
        </div>
      )
    }