- Published on
XTab: Cross-table Pretraining for Tabular Transformers
Summary
Use FT-Transformer architecture to learn dataset-specific embeddings and a shared backbone in a federated setting. Pre-training a transformer-based model shows superior performance over not. However, the proposed method is still beaten by CatBoost.
Approach
Overview of the XTab framework
Main Idea
- Train a shared backbone (transformer stack) that can handle arbitrary embedding representation of tables.
- Learn a new set of embeddings for each table.
Target Task
- Main: Classification
- Auxiliary: Reconstruction, Contrastive, Supervised (classification)
Input Transformation
- categorical: one-hot encoding and embedding lookup table.
- continuous: single embedding and scale by value.
Findings
Experiments
SETTING
- 3 Types of backbones (FT-Transformer, FastFormer, Saint-V)
- Datasets from OpenML-AutoML Benchmark
- 52 datasets for pre-training and 52 for fine-tuning/inference.
- Baselines:
- Tree-based: Random Forest, XGBoost, LightGBM, CatBoost
- NN: Autogluon
- Transformer: FT-Transformer
- Cross-table pre-training shows improved performance compared to single-table training.
- Ablate with same FT-Transformer structure.
- light fine-tuning (3 epochs) is actually better than heavy fine-tuning (unlimited epochs until early stopping).
- GBDTs are still quite strong. But XTab seems best in the DL group.