Currently, the Copilot approach rules the artificial intelligence (AI) world, and in that world, Microsoft created Semantic Kernel as a framework for building its own Copilots. Now, you can use Semantic Kernel too. Semantic Kernel (SK) is an open-source AI framework, created by Microsoft for .NET, Python, and Java developers working with Large Language Models (LLMs). Its purpose is threefold: To abstract away the underlying LLMs, APIs and tooling; to handle more complex implementations in a generic way; and to make it easy to integrate your own content. Semantic Kernel is very similar to LangChain, a popular open source LLM framework for Python and JavaScript/TypeScript, although it's a little less mature and takes a slightly different approach.
This article is not a hands-on guide to programming with Semantic Kernel. Instead, its purpose is to help you wrap your head around SK: what it is and what it isn't. Where it shines and where it doesn't. When to use it to your advantage and when to stay away. I'll tackle programming with SK in another article to follow, hoping that the preview code will be more fully baked by then so the article isn't out of date before it's printed. If you want to get your hands dirty now, by all means, go to the GitHub repo and download the samples and Jupyter notebooks, start a project, get the SK NuGet package and dig in. I hope this article helps you understand SK and provides a roadmap for its use.
A very good question you're probably asking yourself right now is, “Why do I need a framework for working with LLMs?” the short answer is that you don't. Like most frameworks, SK has a learning curve and comes with some overhead. On top of that, it's still early days in terms of programming AI and the landscape is like the wild west. We're all still figuring things out. For basic AI functionality, to get an understanding of the fundamentals of AI programming, or to fine tune for perfection, it's perfectly legitimate to roll your own code and program against the LLMs and APIs directly. In fact, I've found that the SK code evolves so quickly (it's still in preview as of this writing) that I'm encountering breaking changes in the SDK every few weeks or so, while my hand-written AI code is humming along nicely.
Currently, the copilot approach rules the AI world.
Why consider Semantic Kernel now? For one thing, it's not going to be in preview forever. It has some pretty nifty features that make short work of some tedious coding, and it will make it much easier to later change out supporting services, models, and even tools, than hand coding. Currently, most of my code uses the GPT-35-Turbo and GPT-4 models running in Azure and accessed through Azure's OpenAI Integration. It's simple, stable, easy to stand up and tear down, and it's available in enterprise grade. As new models become available, I can evaluate them and use them if I choose.
With SK, I can easily switch to using OpenAI directly instead of Azure OpenAI, or choose to use SK's relatively recent integration with Hugging Face just by changing configuration. Hugging Face is a community platform for hosting or downloading open-source models and datasets. The LLM landscape is changing rapidly and new models become available constantly. It's not only possible, but likely that you'll come across new models or tools that are better at the task you're using them for, cheaper than the ones you're currently using, or both. Re-coding your hand coded app for a new model or platform could take anywhere from minutes to weeks, but reconfiguring SK is relatively easy, so that's the first big reason to consider SK: abstraction.
Another area where SK leverages abstraction is prompts. Prompt engineering is an entire field dedicated to creating the most effective prompts (strings) to feed to an LLM in order to get the best response for a given task. Although what constitutes the “best” prompt can vary from model to model, there are effective techniques that generally make some prompts work better than others across the board. Furthermore, there are techniques to coax the models into giving more usable responses. For example, you can word your prompt to encourage the model to provide the response in a specific JSON format that can easily be consumed by code.
Developing good prompts can be a lot of work, so it makes sense to build a library of prompt templates for specific purposes and to create descriptions of what those prompts do specifically so they can be easily discovered and reused. You can then pass parameters into these templates to generate standardized prompts. SK calls these pre-developed prompt templates semantic functions. And, of course, you can still create semantic functions on the fly. Prompts created on the fly in code are called inline semantic functions.
In addition to custom prompt templates (semantic functions), Semantic Kernel also provides a nice framework for creating and cataloging custom code that you write, called native functions. Imagine a native function that runs a query against a SQL database. That's not something a templated prompt can do, but you can, because you can write whatever code you want. These custom code creations are limited only by your imagination. Just like semantic functions, native functions can accept string parameters as inputs and should return string outputs, suitable as input for another semantic or native function. When it comes to completing a task, SK doesn't differentiate between semantic and native functions. SK functions are black boxes.
Groups of similar functions are called skills in SK. For example, if you have a group of functions that interact with a proprietary database, or a group that parses text, or a group that tells jokes, you can create a skill made up of these functions. Skills mainly exist to help you organize large numbers of functions and can be handy because you can load all of the functions in a skill in a single operation.
And you don't have to build all of the functions yourself. Third parties are creating functions that give you access to their systems and features and are putting them in catalogs with descriptions for you to use. For example, there are functions for Microsoft 365, Microsoft Graph, and Bing from Microsoft, and there are functions for Google and Slack. These third-party functions are often made available to you via a REST endpoint that has a manifest and are referred to as plug-ins. Technically, plug-ins are just skills (groups of functions), so you can take a group of native and/or semantic functions that you write and expose them as a plug-in that others can use by creating a JSON manifest for them that follows the OpenAI plug-in specification and exposing a REST endpoint. I expect literally thousands of third-party plug-ins to be published by the time SK hits version 1.0.
That leads us to another abstraction. Because we now have skills and plug-ins and individual semantic and native functions at our disposal, and because they use another form of abstraction, namely, they all accept string parameters as inputs and return string outputs, you can use these generic building blocks to create a series of steps to fulfill complex tasks. Even the simplest interactions with LLMs often involve multiple steps.
Consider the Retrieval Augmented Generation (RAG) pattern, commonly used to “ground” a model's responses. The first step in the RAG pattern is to use an LLM to determine the intent of the user. For example, a user of an AI chatbot on a retail site might have a question about product features, or about pricing, or about billing. Based on what the user typed, a prompt might be sent to an LLM to determine which of these areas the user is interested in. Based on the user's intent, the next step might be to query product information, or pricing information or billing information from an internal system. This is the retrieval part of RAG. The next step is to combine the user's original query in the first step with the results retrieved in the second step and submit both to an LLM. This is the augmented part of RAG. You're augmenting the user's question with data that you retrieved in order to “ground” the response. The LLM then generates an informed or grounded response using the retrieved information.
I mentioned that SK's purpose is threefold. In addition to providing abstraction, the second big feature of both Semantic Kernel and LangChain is the ability to handle complex tasks on the fly. LLMs are good at generating text, but they can't do everything a user might ask of them and, as a developer, you can't anticipate everything a user might ask it to do. You can build your own workflow (chains or pipelines of functions) by hand as you did in the RAG example above, but what happens when users want something you didn't anticipate?
Semantic Kernel's purpose is threefold: To abstract away the underlying LLMs, APIs and tooling, to handle more complex implementations in a generic way, and to make it easy to integrate your own content.
You can ask SK to build the workflow for you! You do this by asking an LLM to break the complex task down into individual steps that can each be accomplished by one of the functions available to it. SK uses a built-in function it calls a planner to use the descriptions of each function and skill you've loaded to choose a function to accomplish each step in the task. Finally, you have SK execute the steps in the workflow, passing data and context from step to step. LangChain calls this a chain, and SK calls it a pipeline, but both work in a similar fashion.
The ability to create a breakdown of the task and then orchestrate a multi-step solution allows your solutions to handle diverse, complex, ad hoc scenarios. The ability for developers to write custom prompts and custom code to accomplish certain steps that can be consumed by the planner in a generic way makes it easy to create and use a set of features specific to your environment and connect to internal systems and then integrate those custom steps into workflows. For example, imagine you discover that it would be helpful to look up a customer's address in a proprietary database as part of certain workflows. You simply code the lookup and provide a description of what your code does, and the planner can use it. As the planner generates the steps required to complete the overall task, if part of that task involves looking up a customer's address, it can decide to use your code to complete that step in the workflow.
Plug-ins can also be used as steps in the pipeline. For example, you can develop a sophisticated system that's triggered by an event, such as a new customer being entered into a CRM system. An SK pipeline may integrate with several internal and external systems via functions and plug-ins and generate valuable output. Then a plug-in can kick off a workflow that sends that valuable output via your email system to an associate to prep them for a follow up call, then sends a welcome letter to the new client, and alerts a manager that this has all happened.
Semantic Kernel also contains a lot of useful tools. Aside from abstracting the underlying services and models and allowing you to create complex workflows by building up libraries of native and semantic functions, grouping them into skills, exposing them as plug-ins, and creating and executing ad hoc pipelines, SK can help you with many of the mundane tasks you'll need to handle when coding against LLMs. For example, models don't actually ingest text; they work with something called tokens, which are bits of words. Converting text into tokens is called tokenization and is often done by specialized models that SK can help you with. SK can also help you keep track of how many tokens you can feed a model at a time and help you split up large text into smaller batches of tokens in an effective way.
Although that may sound simple, deciding where to split text isn't always easy. Maybe you can split it at paragraph breaks, maybe at sentence breaks. And it's good to have some overlap between batches so that all of the context isn't lost. This is tedious and error prone code that's been written a thousand times. Utilities such as tokenizers and text splitters available in SK can make these tasks relatively painless. Of course, there are many, many open source tokenizers and text splitters available and you can always roll your own, but one-stop shopping is pretty nice and, because they're handled in a generic way, it will be easier to swap them out down the line, if and when you need to.
And that third purpose of SK? Integrating your own content. Imagine your HR department has a large pile of PDF documents and you think your employees might want to ask your in-house HR Copilot questions about the material. This is a great place to use the RAG pattern to inject some knowledge into the prompt to ground the generated answers. But you can't inject them all into the prompt because there's way too much information. How will you query a bunch of PDF documents? You'll need to search them in a sophisticated, natural language way, but there's no LLM trained on your HR data and you probably don't want to go through the time and expense of training one yourself just for this.
This is where embeddings come into play. Embeddings are vector representations of words that can be stored in vector databases, similar to the ones used to create LLMs, and they allow you to do sophisticated, natural language searches. The vector database could be in-memory if there isn't that much data to ingest (SK includes an in-memory vector database) or hosted in a cloud, or you can host one locally. Once you've chosen a vector database implementation, you can use SK-provided sample code to ingest all of those PDFs into the vector database. The sample extracts just the text from the PDFs, tokenizes it, and puts it into the database as vectors of integers. When you've detected that the user has asked an HR question, you can use their natural language question to query the database. The database tokenizes the question, turns the tokens into integer vectors, and uses advanced algorithms to find results that are mathematically closest to the question. You can then grab the top X results and feed them into your prompt. The RAG pattern strikes again. SK calls this type of retrieved content memories and although not all memories require a vector database, those seem to be the most useful type. If you don't want to do all of that yourself, you might consider using Azure Cognitive Search. Azure Cognitive Search is basically a vector database on steroids that you can interact with using REST calls and it contains utilities like one that ingests a bunch of PDFs.
Conclusion
Semantic Kernel can be useful to abstract away low-level AI code, create libraries of useful skills that can be discovered and used in a generic fashion, build and execute complex workflows, integrate with local systems and content, and help developers with mundane but important tasks, like text splitting and tokenization. SK is similar to LangChain but was created by Microsoft and makes LangChain-like features available to .NET developers. Although it may be overkill for simple AI tasks, it allows rapid development of more complex scenarios. Semantic Kernel is still in preview as of this writing, and it's evolving very, very rapidly. Not every project will benefit from SK, but it's a truly powerful SDK and worthy of your curiosity.