Triton Inference Server Feature Requests
Want to be able to load more models into TRTIS GPU memory. Concerned about the inability to anticipate GPU memory consumption/fragmentation. Having UVM would solve this
You won't be notified about changes to this idea.