📄️ Deploy a Ray cluster
Install Ray if we need to use distributed inference capability, following the steps below.
📄️ Distributed Inference with Ray & Fastchat
Use ray cluster for distributed inference
📄️ Run Inference using Ray Serve
We'll introduce how to use Ray cluster to serve LLM with multiple instances and provide high-availability inference service. Here are two methods to run the distributed inference introduced here, here is the methond one.