LiteLVLM

Efficient Large Vision-Language Model for Pixel Grounding

Upload an Image

Drag and drop images hereor click to browseUpload an ImageSupport: JPG, JPEG, PNG (MAX 5MB)

576 / 576

0 / 512

Select a sample to fill the image
and text instruction.

1 / 10

Segmentation output will be shown here...

Visualize LiteLVLM's token pruning process.