
LiteLVLM
Efficient Large Vision-Language Model for Pixel Grounding
1
Upload an Image
Retain Visual Tokens
576 / 576
2
Text Instruction
0 / 512

Examples
Select a sample to fill the image
and text instruction.
1 / 10
3
Output Result
Segmentation output will be shown here...
Token Pruning Animation
Visualize LiteLVLM's token pruning process.