A small language model (SLM) is a machine learning model typically based on a large language mode (LLM) but of greatly reduced size. An SLM retains much of the functionality of the LLM from which it is built but with far less complexity and computing resource demand.
How Are SLMs Used?
Generally, an SLM can do most of the things LLMs do. They can provide conversational responses to text, draw on a training data set to return answers to queries, generate images, or even analyze visual (computer vision) and audio data inputs.
Small language models are still an emerging technology, but show great promise for very focused AI use cases. For example, an SLM might be an excellent tool for building an internal documentation chatbot that is trained to provide employees with references to an org’s resources when asking common questions or using certain keywords.
While an SLM may not be able to draw upon the vast training data sets of an LLM, a properly tuned SLM could still retain much of the natural, conversational experience of such a model — just with a much-narrowed set of data (and, notably, marginally reduced accuracy). In a computer vision scenario, you might train an SLM to identify all objects that contain a particular type (just fruit, for example), versus labeling every known object from a massive training data set (foods, animals, vehicles, people, plants, signs, products).
Just How “Small” is a Small Language Model?
Small language models vary greatly in size. All language models tend to be measured in terms of the number of parameters inside the model, as these parameters govern the size (and inherent complexity — and thus computing demand) of a given model.
Cutting-edge LLMs like OpenAI’s GPT-4 and GPT-4o are estimated to have in excess of 1 trillion parameters (OpenAI does not publish official parameter counts for its models). Microsoft’s latest SLM, Phi-3, uses as few as 3.8 billion parameters, and up to 14 billion. That makes Phi-3 between 0.38% and 1.4% the size of GPT-4o. Some very small language models have parameters measured in the tens of millions!
The size of language models is particularly relevant because these models run in memory on a computer system. This means it’s not so much about physical disk space as it is the dedicated memory to run a model. A model like GPT-4o requires a large cluster of dedicated data center AI servers running expensive specialty hardware from a vendor like NVIDIA to run at all — it’s estimated that OpenAI’s model needs many hundreds of gigabytes of available memory. There would be no realistic way to make such a model run even on a very powerful desktop computer. A small language model, by comparison, might require just a few gigabytes of memory (RAM), meaning that even a high-end smartphone would be capable of running such a model given it contained dedicated AI coprocessing hardware (aka an NPU) to run at a reasonable speed.
LLM vs SLM: What’s the Difference?
Think of SLMs as the portable camping stove to an LLM’s fully-equipped commercial kitchen. An SLM is a flexible but complexity-constrained tool — you could make a mean stew, but probably not bake a wedding cake. The trade-off is obvious, though: An SLM can be taken almost anywhere. Quite literally, it can go in your pocket (on a smartphone). An LLM, by contrast, can cook up essentially anything, but requires a large stationary footprint, a huge upfront investment, and comes with very high operating costs. LLMs also don’t make economic sense unless deployed at massive scale, while SLMs can be feasible even when deployed at the individual device level.
One of the key differentiators for SLM end use cases when compared to LLMs is the ability to run on-device. Laptops and even many smartphones can effectively run an SLM, whereas LLMs require server-grade or data center hardware to be leveraged effectively. SLMs could allow AI features to be enabled for consumers and businesses without the need to tap cloud infrastructure — a potentially huge cost-savings for enabling end AI use cases in the scope of SLMs.
Running costs cannot be overstated when it comes to comparing SLMs and LLMs, either. LLMs require vast amounts of electricity just for the training process of the model — OpenAI spent over $100 million training GPT-4. It is generally assumed that most LLMs run as deeply unprofitable services right now because of their immense resource consumption relative to business value provided. (Over time, these economics will improve, though.)
SLMs, by comparison, use a tiny fraction of those resources, and can spread that consumption across the entire user base. An SLM deployed on a smartphone has an effective “operating cost” of the amount of money the customer spends charging a phone, plus the occasional cost of training and delivering updates to the model. Even training an SLM is far cheaper — what takes days or weeks for an LLM takes minutes or hours for an SLM, and on far less expensive hardware.
How Will SLMs Be Used in the Future?
The future of SLMs seems likely to manifest in end device use cases — on laptops, smartphones, desktop computers, and perhaps even kiosks or other embedded systems. Imagine a check-in kiosk at a doctor’s office that can use a camera to read your insurance or ID card, ask you questions about the reason for your visit with voice input, and provide you with answers to questions about the facility (where’s the bathroom, how long is the wait typically, what’s my doctor’s name). Or, think about shopping at a big box store and walking up to an automated stock-checking robot, asking it where the coconut milk is, and instantly getting a reply with in-store directions shown on a display. In an enterprise setting, an SLM could be connected to a corporate knowledge base and organizational chart, connecting the dots between projects and stakeholders that typically require tedious outreach and repetitive asking and answering of questions. This SLM could run directly inside the corporate chat service on your smartphone.
Whatever the future holds, both SLMs and LLMs will likely be a part of it — their roles are distinctly suited for certain use cases, though overlap may exist between the two from time to time.