The GTX 1650 has only 4GB VRAM, which significantly restricts the size and performance of LLMs (Large Language Models) you can run locally.
4B parameter models (e.g., TinyLlama, Phi-2) may run with quantization (4-bit/8-bit), but context length and speed are limited.
7B parameter models (e.g., Llama 2 7B, Mistral 7B) are generally not feasible to run on a 4GB VRAM GPU, even with aggressive quantization, due to memory constraints and slow inference.
Larger models (13B, 70B, etc.) are not supported at all on this hardware.
For practical LLM/AI workloads, a GPU with 12GB+ VRAM is recommended.
GPU Sharing: The GPU is currently not shared between VMs (no vGPU), but can be shared between containers (e.g., via Docker/NVIDIA runtime).