Skip to main content

Copilot

Pre-Installation and Deployment Preparations

System Requirements

Hardware Requirements

  • CPU: Intel/AMD 32 cores or higher
  • Memory: 256GB RAM or higher
  • Storage: At least 200GB available disk space
  • GPU: Must support CUDA. Memory greater than 40GB. Recommended to use Ampere architecture or higher (e.g., A100, RTX 4090).

Software Requirements:

  • Operating System: Ubuntu 22.04
  • CUDA: ≥11.8 recommended (compatible with Ampere architecture)
  • NVIDIA Driver: ≥535.x
  • Docker: ≥20.10 (requires NVIDIA Container Toolkit must be installed)

AI package

Please contact the salses to get the AI docker image. Save the AI docker image xiangyu-ai.tar to the server.

Installation Steps

Pre-Installation Check:

  • NVIDIA Driver & CUDA: Run the command nvidia-smi to verify the Driver Version and CUDA Version meet the requirements specified in Software Requirements
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L40 On | 00000000:08:00.0 Off | 0 |
| N/A 73C P0 107W / 300W | 1603MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 NVIDIA L40 On | 00000000:09:00.0 Off | 0 |
| N/A 47C P0 84W / 300W | 10817MiB / 46068MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 7072 C text-embeddings-router 1594MiB |
| 1 N/A N/A 309241 C ...rs/cuda_v12_avx/ollama_llama_server 10808MiB |
+-----------------------------------------------------------------------------------------+
  • Docker Environment Check: Run the command docker --version to verify that Docker version meets the version specified in Software Requirements. For example:
Docker version 27.4.1, build b9d17ea
  • NVIDIA Container Toolkit check: Run the command nvidia-container-runtime --version to verify that the NVIDIA Container Toolkit is installed and running. For example:
NVIDIA Container Runtime version 1.17.4
commit: 9b69590c7428470a72f2ae05f826412976af1395
spec: 1.2.0

runc version 1.2.2
commit: v1.2.2-0-g7cb3632
spec: 1.2.0
go: go1.22.9
libseccomp: 2.5.3

AI package installation

  • Download the Xiangyu AI image file to your local machine.

  • Run the following command to load the image into the local Docker library.

    docker load -i xiangyu-ai.tar

  • Run the following command to load the LLM.

    docker run --runtime nvidia --gpus all -p 8000:8000 --ipc=host xiangyu-ai:0.7.2 --model /models --served-model-name Qwen2.5-Coder-14B-Instruct

    When the following information appears on the screen, it means the model has been successfully loaded. RunVLLM

Configure Copilot Extension

  • Open the XiangYu AI Code Assistant interface in the host computer software.

    openExtension

  • Click the Select model icon shown in the figure below to configure the AI model access information, then select Add Chat model. configureExtension

  • In the pop-up window, select the appropriate model configureExtension

  • For users outside of Mainland China, it is recommended to select Anthropic or OpenAI. Choose a suitable model from the list and enter your applied API key. configureExtension

  • For users in Mainland China, it is recommended to select Deepseek. Choose a suitable model from the list (e.g., Deepseek-coder) and enter your applied API key. configureExtension

  • To add a locally deployed model, click the config file link below to open the config.yaml file. configureExtension

  • Configure the AI model access information as specified below. The model OpenAI/Qwen2.5-Coder-14B-Instruct refers to the configuration example in the AI package installation section above. If deploying models locally via ollama, refer to the - name: qwen2.5-coder 3b configuration example. Fill in appropriate parameters (model name, apiBase, etc.) based on your deployment information.

    name: Local Assistant
    version: 1.0.0
    schema: v1
    models:
    - name: OpenAI/Qwen2.5-Coder-14B-Instruct
    provider: openai
    model: Qwen2.5-Coder-14B-Instruct
    apiBase: http://x.x.x.x:8000/v1
    apiKey: test
    roles:
    - chat
    - edit
    - apply
    - autocomplete
    - name: qwen2.5-coder 3b
    provider: ollama
    model: qwen2.5-coder:3b
    apiBase: http://x.x.x.x:11434
    roles:
    - chat
    - edit
    - apply
    - autocomplete
    defaultCompletionOptions:
    contextLength: 32768
    maxTokens: 20480
    context:
    - provider: code
    - provider: docs
    - provider: diff
    - provider: terminal
    - provider: problems
    - provider: folder
    - provider: codebase

  • In the XIANGYU - AI chat window, ask relevant questions to confirm the model responds correctly.