Skip to main content

Roadmap

v0.3.0 - 2024 Apri. Ongoing

  • Share GPTs to other user to build ecosystem

  • Support reranking chain to optimize RAG

  • Support fine-grade update to knowledagebase data

  • Extract images/tables from pdf to enhance data processing

  • Support multimodal models in deployment and inference

  • Support to chat with documents in the conversation

  • Support Gemma and Qwen-VL models

  • Upgrade fastchat to the latest version

  • Chat with images with multimodal models

  • Integration of GPU management, scheduling, and resource monitoring capabilities for containerized environments

  • Integration of API gateway to govern model service APIs, including monitoring, analysis, and security measures, and construct AI gateway

v0.5.0 - 2024 May

  • Playground for datasets, knowledge base, model services, etc., based on streamlit

  • Visualization of various data types, based on streamlit

  • Data Processing - Introduce text annotation (automated + manual) to improve data quality through assisted fine-tuning

  • Data Security - Support data anonymization (e.g., masking sensitive information like ID numbers, phone numbers, and bank account numbers)

  • Enhanced Data Integration - Increase the capability to integrate with various data sources (databases, APIs, etc.) and support data synchronization strategies (automatic synchronization)

  • Support manual evaluation to ensure quality control before deploying to production. Additionally, incorporate manual feedback into the monitoring system

  • Enable user feedback on the question-answering system to facilitate optimization of LLM applications (data processing, prompt optimization, etc.)

  • Support low-resource large model fine-tuning, including RLHF (Reinforcement Learning from Human Feedback), SFT (Semi-Supervised Fine-Tuning) techniques such as Adapter, P-tuning, and LoRA. This improves model quality while reducing performance requirements for model serving (e.g., reducing inference costs, latency issues related to long prompts or slow inference)

  • Model compression techniques

  • Conduct testing and evaluation of model services and embeddings (QA evaluation, metric collection)

  • Implement "scale to zero" capability (integrating with Arbiter) for cold start scenarios, enabling models and applications to evolve towards a Serverless architecture

  • Support orchestration of additional node types such as Agent, Cache, etc

  • Add more best practices for prompt engineering

    • Few-shot learning techniques
    • Chain-of-Thought (CoT) approach
    • Mind-mapping techniques

v1.0 - 2024 Jun.

  • Automatically constructing prompt templates based on data annotations
  • Enhance the monitoring capabilities of LLMOps, monitoring the pipeline from dataset and feature data to model inference, with call chain tracing based on langchain-go
  • Implement a pipeline from data source -> dataset -> data processing -> data versioning -> knowledge base -> model service
  • Strengthen the Python SDK to handle basic capabilities such as dataset manipulation, data processing, and vectorization. These operations can be performed in a notebook environment.
    • Refer Databricks to enhance the developer experience
  • Implement gray release for LLM applications based on AI gateway

v1.x and future

  • Improve user experience and system efficiency