Plant Taxonomy Meets Plant Counting: A Fine-Grained, Taxonomic Dataset for Counting Hundreds of Plant Species

Xu, Jinyu; Hu, Tianqi; Hu, Xiaonan; Zhou, Letian; Cao, Songliang; Zhang, Meng; Lu, Hao

Plant Taxonomy Meets Plant Counting

A Fine-Grained, Taxonomic Dataset for
Counting Hundreds of Plant Species

Jinyu Xu, Tianqi Hu, Xiaonan Hu, Letian Zhou,

Songliang Cao, Meng Zhang, Hao Lu^*

Huazhong University of Science and Technology, China *Corresponding Author

CVPR 2026 Oral

Award Candidate

Paper Code 🤗 Dataset

Abstract

Visually cataloging and quantifying the natural world requires pushing the boundaries of both detailed visual classification and counting at scale. Despite significant progress, particularly in crowd and traffic analysis, the fine-grained, taxonomy-aware plant counting remains underexplored in vision. In contrast to crowds, plants exhibit nonrigid morphologies and physical appearance variations across growth stages and environments.

To fill this gap, we present TPC-268, the first plant counting benchmark incorporating plant taxonomy. Our dataset couples instance-level point annotations with Linnaean labels (kingdom to species) and organ categories, enabling hierarchical reasoning and species-aware evaluation. The dataset features 10,000 images with 678,050 point annotations, includes 268 countable plant categories over 242 plant species in Plantae and Fungi, and spans observation scales from canopy-level remote sensing imagery to tissue-level microscopy.

We follow the problem setting of class-agnostic counting (CAC), provide taxonomy-consistent, scale-aware data splits, and benchmark state-of-the-art regression- and detection-based CAC approaches. By capturing the biodiversity, hierarchical structure, and multi-scale nature of botanical and mycological taxa, TPC-268 provides a biologically grounded testbed to advance fine-grained class-agnostic counting.

Comparison between plants and generic objects

Biodiversity and morphological variations in plants versus generic objects.

Plant categories across four biological levels and scales from microscopy to UAV imagery.

Taxonomic hierarchy treemap of the species and genera in TPC-268.

Experimental Results

1. Performance on TPC-268

Regression-based paradigms consistently outperform detection-based methods, as explicit object localization is severely hindered by the compact spatial arrangement and structural entanglement of plant organs. LOCA achieves the best test performance by effectively integrating local structural cues. In contrast, models relying primarily on global self-attention (e.g., CACViT, TasselNetV4) show strong validation results but exhibit significant generalization gaps on unseen test scenes, indicating a tendency to overfit validation distributions.

Table 2a: 3-Shot Setting

Method	Venue	Backbone	Validation			Test
Method	Venue	Backbone	MAE ↓	RMSE ↓	R² ↑	MAE ↓	RMSE ↓	R² ↑
FamNet	CVPR'21	R50	28.87	52.51	0.58	30.43	65.62	0.62
BMNet+	CVPR'22	R50	29.33	77.78	0.47	27.78	57.25	0.50
C-DETR	ECCV'22	R50	22.66	77.51	0.75	22.68	57.97	0.74
SPDCNet	BMVC'22	R18	25.66	72.49	0.52	23.70	47.53	0.64
CountTR	BMVC'22	Hybrid	20.21	55.82	0.73	25.19	49.94	0.62
SAFECount	WACV'23	R18	22.57	63.65	0.64	25.70	52.30	0.58
LOCA	ICCV'23	R50	17.26	53.19	0.75	17.51	38.37	0.78
DAVE	CVPR'24	R50	16.47	52.87	0.76	17.61	40.06	0.75
CACVIT	AAAI'24	ViT-B	16.63	42.49	0.82	22.04	41.79	0.73
CountGD	NeurIPS'24	Swin-B	18.32	54.55	0.74	19.52	50.51	0.61
TasselNetV4	ISPRS'26	ViT-B	13.20	43.93	0.83	22.95	51.36	0.60

Table 2b: 1-Shot Setting

Method	Venue	Backbone	Validation			Test
Method	Venue	Backbone	MAE ↓	RMSE ↓	R² ↑	MAE ↓	RMSE ↓	R² ↑
FamNet	CVPR'21	R50	33.11±0.68	68.95±4.15	0.58±0.05	33.63±1.13	62.07±2.94	0.41±0.05
BMNet+	CVPR'22	R50	29.33±0.13	77.50±0.31	0.48±0.05	27.84±0.10	56.98±0.12	0.50±0.01
CountTR	BMVC'22	Hybrid	20.16±0.05	55.15±0.82	0.73±0.01	25.19±0.14	50.23±0.24	0.62±0.00
LOCA	ICCV'23	R50	17.19±0.31	48.14±2.19	0.80±0.02	21.47±0.29	42.36±0.72	0.73±0.01
DAVE	CVPR'24	R50	16.06±0.60	48.35±1.19	0.80±0.01	19.47±0.44	42.54±0.35	0.72±0.00
CACVIT	AAAI'24	ViT-B	17.96±0.16	43.38±0.47	0.83±0.00	22.06±0.11	42.97±0.81	0.71±0.01
TasselNetV4	ISPRS'26	ViT-B	13.49±0.02	41.30±0.46	0.85±0.00	22.20±0.11	48.70±0.26	0.67±0.00

2. Cross-Dataset Transfer Analysis

Generic models trained on FSC-147 suffer severe performance degradation on TPC-268 (MAE increases up to 225%). Conversely, models trained on plant data transfer more robustly to generic scenes, indicating that plant counting presents a more challenging representation problem due to morphological complexity.

Method	FSC-147 → TPC-268		TPC-268 → FSC-147
Method	MAE	Δ vs Same-Domain	MAE	Δ vs Same-Domain
CountTR	38.62	+225%	26.53	+5%
CACVIT	26.73	+147%	17.88	-19%
LOCA	24.70	+130%	15.16	-13%

3. Zero-Shot and Foundation Models

Current zero-shot methods (GroundingREC) and vision-language backbones (BioCLIP2) underperform relative to visual-exemplar methods. The low-resolution feature maps of ViT architectures, without specific adapter designs, are suboptimal for dense prediction, suggesting that explicit modeling of visual similarity remains more effective than text-only or off-the-shelf foundation features.

Method / Paradigm	Test MAE	Test R²
LOCA (3-Shot Visual)	17.51	0.78
GroundingREC (Zero-Shot Text)	24.14	0.53
LOCA + BioCLIP2 Backbone	34.75	0.29

4. Taxonomic Knowledge as Inductive Bias

Incorporating Linnaean taxonomy as textual prompts yields consistent error reduction (e.g., MAE drops from 19.52 to 16.90). This confirms that structured biological knowledge provides a practical and effective inductive bias for fine-grained counting tasks.

Target Specification	MAE ↓	RMSE ↓	R² ↑
3 visual exemplars	19.52	50.51	0.61
+ species name	17.53	44.80	0.69
+ full taxonomy	16.90	43.32	0.71

t-SNE visualization. It highlights that visual features alone struggle to cluster deep biological taxa.

TPC-268 Showcase

TPC-268 diversity across scales. Multi-scale morphologies from microscopic tissues to canopy-level remote sensing.

Qualitative results on TPC-268. Predicted counting results from representative methods across diverse scenarios.

BibTeX

@article{xu2026plant,
  title={Plant Taxonomy Meets Plant Counting: A Fine-Grained, Taxonomic Dataset for Counting Hundreds of Plant Species},
  author={Xu, Jinyu and Hu, Tianqi and Hu, Xiaonan and Zhou, Letian and Cao, Songliang and Zhang, Meng and Lu, Hao},
  journal={arXiv preprint arXiv:2603.21229},
  year={2026}
}