MAGIC - A Framework for LLM-Powered Automation

    Creating autonomous systems powered by large language models

    Published on April 15, 2024

    Make it more agentic

    Github

    MAGIC (Machine-Assisted Generative Intelligence and Coordination) is a framework for creating autonomous systems powered by Large Language Models (LLMs). It provides a structured environment where agents, powered by LLMs, perform tasks through defined actions, which can be executed directly or delegated to specialized agents, enhancing the system's flexibility and capability.

    The MAGIC framework aims to simplify the process of building and managing complex, LLM-powered automation systems by providing a clear, well-defined structure for agent creation, task delegation, and action execution -- and a long overly-verbose name to avoid any miscommunication.

    (Code at the bottom)

    Core Components

    • Agent Identity and Parameters: Each agent within the MAGIC system has a defined identity and set of parameters guiding its operations, interactions, and decision-making processes.
    • Agent Schema: This defines the expected structured output from an agent, ensuring consistency and reliability in the agent's responses and actions.
    • Action Definitions: Actions represent tasks or operations an agent can perform. Each action has defined input parameters, a handler function, and context describing its purpose and usage conditions.

    Workflow

    1. Context and resource gathering and resolution: The system will gather all relevant context and external resources it needs, given the input parameters - then resolve what of that set will be used to provide to the orchestration agent based on the configuration of the system (max token count, etc..)

    2. Task Reception and Analysis: The orchestration agent analyzes incoming context and input parameters to understand their requirements and context.

    3. Delegate Selection and Task Execution: Selects appropriate delegate agents or action handlers for task execution, who then process and perform the tasks, leveraging their specialized capabilities.

    4. Aggregation and Response: Post-execution, the orchestration agent synthesizes the results, preparing responses or actions in alignment with overarching objectives.

    5. Feedback Loop: Continuously reviews outcomes to refine task distribution and execution, enhancing the system’s efficiency and adaptability.

    This structured workflow, built around the core components of MAGIC, enables the creation of flexible, adaptable, and efficient LLM-powered automation systems that can handle a wide range of tasks and scenarios.

    Action Execution and Delegation

    Actions in MAGIC are defined by their ability to cause change or fetch information as needed. Actions can be classified based on whether they have side effects or contribute to ongoing context awareness and decision-making processes.

    • Direct Action Execution: Agents execute actions based on the LLM’s insights and predefined parameters, directly affecting the system or environment.

    • Delegated Action Execution: Complex tasks are delegated to specialized agents, known as delegates, which are designed to handle specific operations efficiently. These delegates can be other LLM-powered agents or specialized automation systems, allowing for polymorphic task execution.

    • Side Effect and Context-Aware Actions: Actions are categorized based on their nature—producing direct side effects or aiding in ongoing decision-making. This distinction aids in the management of action flows and their impact on the system.

    Using orchestrated delegation

    In orchestrated delegation, a primary agent acts as a coordinator, directing tasks to delegate agents based on their specialized capabilities and the task's requirements.

    • Orchestration Agent: Manages the distribution of tasks, ensuring they are executed efficiently and effectively by the most suitable delegate.

    • Delegate Agents: Perform the actions they are assigned, utilizing their specialized skills and knowledge to achieve the best outcome.

    But is it Agentic?

    In the ever-evolving quest to anthropomorphize our algorithms, we've recently landed on ‘agentic’ to describe all things AI + automation (except for sometimes) - but.. What is an agent? What does it mean to be agentic? Is it just a bit of MAGIC?

    Every person I speak to has a different definition, and it’s hard to take anyone seriously when they attempt to describe something simple with something vague.

    In this example and anywhere else in this code, agents are defined as a preconfigured instance of an LLM call, with an “identity” prompt and some optional set of static or dynamic messages pre- or post- pended to every request.

    Implementing a system that uses the ideas described in MAGIC is relatively simple - depending on the use case, it can scale up to handle an enormous amount of complexity - but the concepts remain the same at every new layer of added functionality.

    You will notice in the examples that I do not rely on llm function calling for the execution of actions - the reason is twofold:

    1. I usually want to keep the option to swap out a provider or model that may not support the same function calling ability

    2. Anecdotal, but I have used this approach as well as function calling and multi function calling to try and achieve the same results and have observed that in most cases I am better able to prompt my way into having the agent adhere to my rules around when/what for calling functions when I define them in this way. (Behind the scenes - to get the structured response we are using function calling when available - the action and actionParams are just params for our singular function that is requesting the structured response)

    It works for me, but do what makes you happy.

    A basic implementation

    Agent creation utility Using Instructor (for structured output and partial json streaming) and OpenAI we define the scope of what an agent is. The oai import is a pre-configured OpenAI client, and the client variable is an instance of the Instructor client, which will be used to create agent instances.

    import { oai } from "@/lib/oai"
    import Instructor from "@instructor-ai/instructor"
    import OpenAI from "openai"
    import { z } from "zod"
    
    export type CreateAgentParams = {
      config: Partial<OpenAI.ChatCompletionCreateParams> & {
        model: OpenAI.ChatCompletionCreateParams["model"]
        messages: OpenAI.ChatCompletionMessageParam[]
      }
      response_model: {
        schema: z.AnyZodObject
        name: string
      }
    }
    
    export type AgentInstance = ReturnType<typeof createAgent>
    export type ConfigOverride = Partial<OpenAI.ChatCompletionCreateParams>
    
    const client = Instructor({
      client: oai,
      mode: "TOOLS"
    })
    
    /**
     * Create a pre-configured "agent" that can be used to generate completions
     * Messages that are passed at initialization will be pre-pended to all completions
     * all other configuration can be overriden in the completion call.
     *
     * @param {CreateAgentParams} params
     *
     * @returns {AgentInstance}
     */
    export function createAgent<S extends z.AnyZodObject>({
      config,
      response_model
    }: {
      config: Partial<OpenAI.ChatCompletionCreateParams> & {
        model: OpenAI.ChatCompletionCreateParams["model"]
        messages: OpenAI.ChatCompletionMessageParam[]
      }
      response_model: {
        schema: S
        name: string
      }
    }) {
      const defaultAgentParams = {
        temperature: 0.7,
        top_p: 1,
        frequency_penalty: 0,
        presence_penalty: 0,
        n: 1,
        ...config
      }
    
      return {
        /**
         * Generate a single stream completion
         * @param {ConfigOverride}
         *
         * @returns {Promise<AsyncGenerator<z.infer<typeof response_model.schema>>> }
         */
        completionStream: async (configOverride: ConfigOverride) => {
          const messages = [
            ...(defaultAgentParams.messages ?? []),
            ...(configOverride?.messages ?? [])
          ] as OpenAI.ChatCompletionMessageParam[]
    
          const extractionStream = await client.chat.completions.create({
            ...defaultAgentParams,
            ...configOverride,
            response_model,
            stream: true,
            messages
          })
    
          return extractionStream
        },
        completion: async (configOverride: ConfigOverride) => {
          const messages = [
            ...(defaultAgentParams.messages ?? []),
            ...(configOverride?.messages ?? [])
          ] as OpenAI.ChatCompletionMessageParam[]
    
          const client = Instructor({
            client: oai,
            mode: "TOOLS"
          })
    
          const extraction = await client.chat.completions.create({
            ...defaultAgentParams,
            ...configOverride,
            response_model,
            stream: false,
            messages
          })
    
          return extraction
        }
      }
    }
    

    Creating an agent Here we define our core orchestration agent for the system, as well as defining the actions available for that agent. The z import from the zod library is used to define the schema for the agent's actions and responses, ensuring type safety and validation.

    import { createAgent } from "../"
    import z from "zod"
    
    const coreAgentActions = {
    	UPDATE_USER_DATA: "UPDATE_USER_DATA",
    	GET_THINGS_FROM_API: "GET_THINGS_FROM_API"
    }
    
    export const updateUserParams = z.object({
    	action: z.literal(coreAgentActions.UPDATE_USER_DATA),
    	data: z.record(z.string(), z.any()).describe("user propeties to update or add")
    })
    
    
    export const getThingsParams = z.object({
    	action: z.literal(coreAgentActions.GET_THINGS_FROM_API),
    	query: z.string().describe("the query to use when fetching things from the api")
    })
    
    export const actionParams = z.discriminatedUnion("action", [
      updateUserParams,
      getThingsParams
    ])
    
    const coreAgentSchema = z.object({
    	content: z.string().describe("the response to the user"),
    	action: z.enum([...Object.values(coreAgentActions)]).optional(),
    	actionParams
    })
    
    export const actionDefinitions = {
    	[coreAgentActions.UPDATE_USER_DATA]:{
    		handler: async function({ data }: z.infer<typeof updateUserParams>) {
    				await db.user.upsert(data)
    				return void
    		},
    		description: "Persist any new information about the user to the database.",
        sideEffect: true,
    		example: `
    			[user]: oh my email is dimitri@sick.email;
    			//assistant response: { content: "great, thank you", action: UPDATE_USER_DATA, actionParams: { email: "dimitri@sick.email"}}
    		`
    	},
    	[coreAgentActions.GET_THINGS_FROM_API]:{
    		handler: async function({ query }: z.infer<typeof getThingsParams>) {
    				const response = await fetch(`things.com/api?q=${query}`)
    				return await response.json()
    		},
        sideEffect: false,
    		description: "fetch things from the api given an explicit user request or when it is relevant to do so",
    		example: `
    			[user]: what kind of shiney things do you sell?
    			// assistant response: {
    				content: "One sec, let me find you some good ones.",
    				action: "GET_THINGS_FROM_API",
    				actionParams: {query: "shiney"
    				}
    			}
    			// ...system calls action handler...
    			// action handler output: [{ url: "s.co/123", title: "so shiney thing"}]
    			// assistant called again with action handler output
    			//assistant response: {
    				content: "Found a really great shiney thing for you! It's the 'so shiney thing'.
    				things: [{ url, title }]
    			}
    		`
    	}
    }
    
    
    export const primaryIdentity = `
    You are a world-class AI assistant agent, tasked with responding to user queries and delegating complex tasks to other agents. You will not only be the direct point of contact with the end user but will also be responsible for deciding when to call the provided actions - these actions can be other agents and/or pure functions to execute. In some cases the actions will be defined in a way that requires they return their output back to you, in these cases you will use that provided output to best respond to the user - in other cases the actions will be marked as side-effects and you will not receive a response, only provide that action with the context it requires.
    
    	Those actions are:${actionDefinitions}
    `
    
    export const coreAgent = createAgent({
      config: {
        model: "gpt-4-turbo",
        max_tokens: 650,
        temperature: 0.1,
        messages: [
          {
            role: "system",
            content: primaryIdentity
          }
        ]
      },
      response_model: {
        schema: coreAgentSchema,
        name: "core agent response"
      }
    })
    

    Simplified example of the agents in action

    type MagicFlowInputParams = {
      prompt: string;
      conversationId: string;
    }
    
    async function getContextMessages({ prompt, conversationId }: MagicFlowInputParams) {
      const conversationMessages = await db.messages.get({ where: { conversationId }})
      const ragResults = await vectordb.query(prompt)
    // The resolveContextToUse function (not shown) determines which context messages to use based on the conversation history and the results of the vector database query.
    return resolveContextToUse({ conversationmessages, ragResults })
    }
    
    async function coreAgentCall({ messages, isFollowUp = false }) {
    
    // It isn't necessary to use a stream if you don't need it, but you can optimistically react to the response from the agent while it is generating content and reduce the final execution time by actively reacting to the state of the stream.
    
      const completionStream = await coreAgent.stream({ messages })
    
      const final = {}
    
    // The for await...of loop allows for processing the completionStream asynchronously, enabling optimistic updates to the client while the agent is generating content.
      for await (const partial of completionStream) {
    
    	//send to a websocket or pubsub channel or something.
        publishToClientStream(partial.content)
    
    	final = partial
      }
    
      return final
    }
    
    
    async function handleActions({ action, actionParams }) {
      if(!action) return
      const { handler, sideEffect } = actionDefinitions[action]
    
      const result = await handler(actionParams)
    // The sideEffect property of an action definition determines whether the action has a direct impact on the system or environment (e.g., updating a database), or if it simply returns a result to be used by the agent.
      if(sideEffect) {
    	return void
      }
    
      return `The result of the ${action} call is: ${JSON.stringify(result)}`
    }
    
    
    
    const messages = await getContext(inputParams)
    const agentResponse = await coreAgentCall({ messages })
    const actionResult = await handleActions(agentResponse)
    
    if(actionResult) {
      await coreAgentCall({ 
        isFollowUp: true,
        messages: [
    	  ...messages,
    	  { role: "assistant", content: agentResponse.content },
     	  { role: "system", content: actionResult }
      ]})
    }
    

    The MAGIC framework provides a powerful and flexible foundation for building LLM-powered automation systems. By defining clear roles and responsibilities for agents, actions, and delegation, MAGIC enables you to create complex, adaptable systems that can handle a wide range of tasks and scenarios.

    However, it's important to note that MAGIC is not a silver bullet solution. The effectiveness of a MAGIC-based system will depend heavily on the quality of the LLMs used, the design of the agent identities and action definitions, and the overall architecture of the system.

    Example Use Case: Automated Task Management

    In a scenario like automated task management, the MAGIC system could use a core agent to assess the tasks at hand and delegate specific actions to other agents designed to handle those tasks. For example, the core agent might delegate a task like "schedule a meeting with John" to a calendar management agent, which would then handle the specifics of finding an available time slot and sending out the meeting invitation.

    Example Use Case: Conversational Agent

    In a conversational agent scenario, MAGIC would manage dialogue flow, content generation, and context retention, dynamically adjusting responses and actions based on the conversation's evolution and external data inputs. For instance, if a user asks about a specific product, the core agent could delegate the task of retrieving product information to a specialized product catalog agent, which would then return the relevant details to be incorporated into the core agent's response.

    Agentically Automating the Future

    With a clear, well-defined and structured approach to agent creation, task delegation, and action execution, MAGIC demonstrates that the power of simple, thoughtful design and implementation of systems that leverage the capabilities of LLMs to solve real-world problems.

    So, as we continue to build agentic systems and ponder the meaning of agency in the context of AI, let’s remember that success of our "agentic" systems will be measured not by the cleverness of their names , but by their ability to make a difference in the lives of the people they serve. So let’s build with that in mind, and let the results speak for themselves.—— If you're interested in experimenting with the MAGIC framework or have ideas for how it could be improved, I encourage you to check out the code and share your thoughts.


    At Novy, we are building LLM-powered solutions for clients across a wide range of industries. Our team is constantly learning, experimenting with new techniques, and exploring innovative ideas. MAGIC is based on a high-level architecture we have employed successfully in many projects.

    In the future, we may open-source more of the components we use to build our systems, allowing the community to benefit from our experience and contribute to the development of advanced LLM-powered automation solutions.

    If you're interested in discussing how these systems can evolve, tools you wish existed, or exploring what's possible with LLM-powered automation, please reach out @dimitrikennedy, I’m always excited to connect share ideas.