Clearly explained guide for running quantized open-source LLM applications on CPUs using Llama 2, C Transformers, GGML, and LangChain — Third-party commercial large language model (LLM) providers like OpenAI’s GPT4 have democratized LLM use via simple API calls. However, teams may still require self-managed or private deployment for model inference within enterprise perimeters due to various reasons around data privacy and compliance.