The Semantics of Intuition and Communication
Designing systems that naturally understand and meet our expectations.
Published on April 13, 2024> "The last ever dolphin message was misinterpreted as a surprisingly sophisticated attempt to do a double-backwards-somersault through a hoop whilst whistling the 'Star Spangled Banner,' but in fact the message was this: ‘So long and thanks for all the fish.'" ― Douglas Adams, The Hitchhiker’s Guide to the Galaxy One of the biggest challenges in both human-to-human and human-to-machine interactions: the potential for misinterpretation. In the context of building AI interfaces, it is a reminder of the complexities involved not just in interpreting data, but in grasping the nuanced layers of language, semantics, and human intent. We should aim to bridge the semantic gap between human intuition and machine processing. Designing our systems not only to analyze and generate insights but also to contextualize and understand them in ways that resonate with human semantics and our subjective experience. Acknowledge the inherent challenges in communication and interpretation, striving to create systems that can more accurately 'understand' and respond to the nuanced and often ambiguous nature of human thought and language. By deepening our focus on semantic understanding, we aim to facilitate more effective communication, both between humans and machines, and potentially extending to improve human-to-human interactions through machine mediation. To emulate human intuition more closely, our systems must navigate semantics, where meaning is not only often implicit or open to interpretation, but is also universally shaped by the individual experience and perspective. To overcome these challenges we have to think beyond our current methods of data processing. We aspire to a level of semantic analysis and interaction that brings us closer to bridging the communication gap, enhancing the connection between human cognitive processes and the systems we design for ourselves.
Embracing the Ingenuity of Fools
Designing intuitive systems by embracing human complexity and 'foolishness'
Published on April 12, 2024> "A common mistake that people make when trying to design something completely foolproof is to underestimate the ingenuity of complete fools." - Douglas Adams, Mostly Harmless In the quest to create systems that understand or simulate human intuition, we have to acknowledge the complexity and unpredictability of human nature. We are all fools—limited by our perspectives and subject to misinterpretations and oversights. This realization is not a pessimistic resignation but a guiding principle. Acknowledging our 'foolishness' is the first step in designing systems that are not just technically sophisticated but also fundamentally aligned with human cognition and interaction. It implies that while striving to build systems that enhance decision-making and intuition, we have to design with humility, accepting the limitations and embracing the unpredictability inherent in human-like intuition. Incorporating this understanding means designing systems that are adaptable and resilient, capable of evolving with the fluid dynamics of human thought processes. To build reflective, learning entities that recognize and adapt to the 'foolishness' it seeks to understand and replicate. To create systems that can come close to mirroring the nuanced and often paradoxical nature of human intuition and decision-making. It's an acknowledgment that in the realm of creating intuitive systems, we are navigating an uncertain and complex world, where the goal is not to eliminate foolishness but to understand and engage with it constructively.
Building Instructor js
Building the new js port of the popular python lib
Published on January 4, 2024Recently, I stumbled upon a tweet from the [creator of Instructor](https://twitter.com/jxnlco), a Python library that has a great community. They were on the hunt for someone to craft the JavaScript version. The mission and vision align with what I have been working towards on my own, so I reached out and started building. I built most of Instructor on top of some of the tools I have been working on this year - enabling parital json streaming and managing structured output with Zod. Instructor has a nice clean API and an existing community that I am excited to start working with. Instructor is similar to to what I did with "schema agents" in my agents package - but focused on structured extraction. "Structured extraction in Typescript, powered by llms, designed for simplicity, transparency, and control." The Instructor instance is a proxy directly to the OpenAI SDK - it only patches the chat completion call wtih a few new options - adding a response_model (a zod schema) and on the instance intiialization a "mode" - which determines wether or not to coerce the response to json via a prompt, function call, or function call via tools. The simplicity and focus on staying close to the sdk makes it approchable and clear. The project fits well within the other work I have been doing so I am excited to keep contributing and work it into my stack. I was able to use a lot of pre-exisiting tools I wrote from the past work I have done in the base Instructor instance and was able to enable a powerful streaming mode using [schema-stream](https://www.npmjs.com/package/schema-stream) The github here: [instructor-js](https://github.com/jxnl/instructor-js) ### Basic streaming example with instructor-js <Video playbackId="s1lYluB22pk6yp9OJ00SVDdCW8Vrf1bxrm7v00HLg2aAM" /> --- Define a zod schema ```tsx export const ExtractionValuesSchema = z.object({ users: z .array( z.object({ name: z.string(), handle: z.string(), twitter: z.string() }) ) .min(5), date: z.string(), location: z.string(), budget: z.number(), deadline: z.string().min(1) }) export type Extraction = Partial<z.infer<typeof ExtractionValuesSchema>> ``` make a completion call ```tsx import Instructor from "instructor" import OpenAI from "openai" import { z } from "zod" import { extractionValuesSchema, Extraction } from "./schema" const textBlock = ` In our recent online meeting, participants from various backgrounds joined to discuss the upcoming tech conference. The names and contact details of the participants were as follows: - Name: John Doe, Email: johndoe@email.com, Twitter: @TechGuru44 - Name: Jane Smith, Email: janesmith@email.com, Twitter: @DigitalDiva88 - Name: Alex Johnson, Email: alexj@email.com, Twitter: @CodeMaster2023 - Name: Emily Clark, Email: emilyc@email.com, Twitter: @InnovateQueen - Name: Ron Stewart, Email: ronstewart@email.com, Twitter: @RoboticsRon5 - Name: Sarah Lee, Email: sarahlee@email.com, Twitter: @AI_Aficionado - Name: Mike Brown, Email: mikeb@email.com, Twitter: @FutureTechLeader - Name: Lisa Green, Email: lisag@email.com, Twitter: @CyberSavvy101 - Name: David Wilson, Email: davidw@email.com, Twitter: @GadgetGeek77 - Name: Daniel Kim, Email: danielk@email.com, Twitter: @DataDrivenDude During the meeting, we agreed on several key points. The conference will be held on March 15th, 2024, at the Grand Tech Arena located at 4521 Innovation Drive. Dr. Emily Johnson, a renowned AI researcher, will be our keynote speaker. The budget for the event is set at $50,000, covering venue costs, speaker fees, and promotional activities. Each participant is expected to contribute an article to the conference blog by February 20th. A follow-up meeting is scheduled for January 25th at 3 PM GMT to finalize the agenda and confirm the list of speakers. ` const oai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY ?? undefined, organization: process.env.OPENAI_ORG_ID ?? undefined }) const client = Instructor({ client: oai, mode: "TOOLS" }) const extractionStream = await client.chat.completions.create({ messages: [{ role: "user", content: textBlock }], model: "gpt-4", response_model: ExtractionValuesSchema, max_retries: 3, stream: true }) let extraction: Extraction = {} for await (const result of extractionStream) { try { extraction = result console.clear() console.table(extraction) } catch (e) { console.log(e) break } } console.clear() console.log("completed extraction:") console.table(extraction) ``` return a completion stream from an api route ```tsx import { ReadableStream } from 'stream'; import { extractionValuesSchema, Extraction } from "./schema" function asyncGeneratorToReadableStream(generator) { const encoder = new TextEncoder(); return new ReadableStream({ async start(controller) { for await (const parsedData of generator) { controller.enqueue(encoder.encode(JSON.stringify(parsedData))); } controller.close(); }, cancel() { if (cancelGenerator) { cancelGenerator(); } } }); } export const runtime = "edge" export async function POST(request: Request): Promise<Response> { const { messages, prompt } = await request.json() const oai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY ?? undefined, organization: process.env.OPENAI_ORG_ID ?? undefined }) const client = Instructor({ client: oai, mode: "TOOLS" }) const extractionStream = await client.chat.completions.create({ messages: [...messages, { role: "user", content: prompt }], model: "gpt-4", response_model: ExtractionValuesSchema, max_retries: 3, stream: true }) const stream = asyncGeneratorToReadableStream(extractionStream); return new Response(stream) } ```
Breaking Free from the Chat Box
Challenges of chat interfaces and AI tooling.
Published on January 1, 2024Chat interfaces are now a staple in the software world— I helped pave that road during my 6 years at Drift. We campaigned hard to prove chat wasn’t just for customer service or casual conversations: championing chat over forms with our “no forms campaign”. Fast forward to now, and chat’s more versatile than ever, breathing life into various software and user experiences. Enter Chat GPT, Midjourney, and a slew of other chat-first AI tools. They’ve elevated chat from a simple communication tool to the control center for powerful AI capabilities. But its come at a cost: while chat is a solid starting point, it’s also a creative bottleneck. It’s not that it’s the wrong tool for the job—it’s more that our reliance on chat stifles our imagination. We’re so accustomed to interacting with these advanced AI systems through chat that it’s hard to envision other, potentially richer, ways to engage. ### Chat's Limitations in UX and Technical Scope While chat’s ubiquitous presence makes it an easy go-to, its simplicity is also its downfall. We're often so accustomed to chat that we overlook opportunities for more engaging user experiences. We’re so used to chat that it’s like we’re wearing blinders, focusing only on a narrow pathway of interaction. This leads to a one-size-fits-all approach, where unique opportunities for richer, more engaging user experiences are overlooked. Chat has its own language, a set of rules we’ve internalized so deeply that we forget other languages exist—languages that could offer users far more nuanced interactions. ### The Time-Complexity Dilemma Chat’s format struggles with complex, structured data—think of it as the linear time complexity of UX, good for quick operations but lacking when you need to scale the conversation. While dropping in graphs or videos is possible, a series of message boxes doesn’t always cut it. Waiting for a response from GPT-4 could take 30 seconds or more, and in UX, as in algorithms, efficiency matters. When you’re working with tools like OpenAI’s Chat API, time can stretch out. A complex prompt may require tens of seconds for a response. Streaming text or markdown alleviates the wait time, a crucial UX improvement. However, the limitation lies in the challenge of merging this real-time benefit with structured JSON or richer data types. The streaming approach, while efficient, doesn’t easily support the simultaneous delivery of such data. And there’s the rub: we’ve solved one UX problem but inadvertently narrowed our options for richer, multi-layered interactions. Time isn’t just money; it’s also user engagement. ### The Cycle of Constraints and Creative Exploration The limitations of existing AI tooling are not mere inconveniences; they set boundaries on what’s possible and, more importantly, what can be easily imagined. While workarounds are possible, they can deter creators from truly exploring the full potential of these technologies. In constraining our tools, we may unintentionally be constraining our creative capabilities as well. I started to encounter firsthand the challenges in crafting what I envisioned using just the OAI SDK or LangChain. The need for a tool that could efficiently handle complex, structured data and offer more than just string responses was apparent. It wasn't just about managing prompts and state; it was about envisioning an agent instance with a solid identity and a dependable response model, accessible on every request. Driven by these needs, I embarked on creating a suite of utilities. This began with a lightweight wrapper around the OAI SDK, allowing the definition of 'agents' and a 'schema agent.' The latter simplified interactions by integrating a Zod schema, managing jsonSchema creation, and handling function calls. This development was not just about streaming text; it was about streaming objects and arrays, accessing these data structures as soon as the stream began. Furthermore, to handle the real-time data interaction, I built a library capable of parsing streaming JSON, populating a pre-stubbed data structure as the data arrives. This approach allowed for immediate use of the streamed data, enabling more dynamic and responsive user experiences. Finally, I tied these components together with a set of React hooks designed for managing the streaming connection, making requests to endpoints, and utilizing the schema-stream for instant data availability. --- ## Usage with my current toolkit Below is an example usage in a Next.js application - from defining an agent, setting up a route and consuming it all on the client. A working demo u can play with here: [Demo - JSON Stream dashboard](https://dashboard-demo.novy.work/) Packages on npm: [zod-stream](https://www.npmjs.com/package/zod-stream) [stream-hooks](https://www.npmjs.com/package/stream-hooks) built using: [schema-stream](https://www.npmjs.com/package/schema-stream) [docs for all](https://island.novy.work) <br /> ### Basic streaming example with zod-stream and stream-hooks Defining a schema (response model) ```tsx // ./schema.ts (defined in sep file since // we import both into a client component and a server context) import z from "zod" export const coreAgentSchema = z.object({ listOfReasonsWhyJavascriptIsBetterThenPython: z.array( .array( z.object({ name: z.string(), description: z.string() }) ) ).min(10), listOfReasonsWhyPythonIsBetterThenJavascript: z.array( .array( z.object({ name: z.string(), description: z.string() }) ) ).min(1), finalSummary: z.string(), pointsForPython: z.number() .min(0) .max(100), pointsForJavascript: z.number() .min(0) .max(100) }) ``` Defining an agent ```tsx import { createSchemaAgent } from "@hackdance/agents" import { coreAgentSchema } from "./schema" export const primaryIdentity = ` You are an ai agent tasked with debating the merits of Python and Javascript. ` export const coreAgent = createAgent({ client: oai, config: { messages: [ { role: "system", content: indentityPrompt } ], model: "gpt-4-1106-preview", temperature: 0.5, max_tokens: 500 }, response_model: { schema: coreAgentSchema, name: "structured agent response" } }) ``` Setting a route to create the completion and stream ```tsx import { coreAgent } from "@/agents/example" export async function POST(request: Request): Promise<Response> { const { messages } = await request.json() try { const stream = await exampleAgent.completionStream({ messages }) return new Response(stream) } ``` Using the hooks to consume the stream and start rendering content ASAP ```tsx import { exampleAgent } from "@/ai/agents/example/schema" import { useState } from "react" export function StreamTest() { const [result, setResult] = useState({}) const { startStream, stopStream, loading } = useJsonStream({ schema: coreAgentSchema, onReceive: data => { setResult(data) } }) const go = async () => { try { await startStream({ url: "/api/ai/chat", body: { messages: [{ role: "user", content: prompt }] } }) } catch (e) { stopStream() } } return ( <div> <div> {JSON.stringify(result)} </div> <button onClick={go}>Go</button> </div> ) } ``` <br /> --- <br />