Distributed Inference

Distributed inference using Ray cluster

📄️ Deploy a Ray cluster

Install Ray if we need to use distributed inference capability, following the steps below.

📄️ Distributed Inference with Ray & Fastchat

Use ray cluster for distributed inference

📄️ Run Inference using Ray Serve

We'll introduce how to use Ray cluster to serve LLM with multiple instances and provide high-availability inference service. Here are two methods to run the distributed inference introduced here, here is the methond one.