XTab: Cross-table Pretraining for Tabular Transformers

Summary

Use FT-Transformer architecture to learn dataset-specific embeddings and a shared backbone in a federated setting. Pre-training a transformer-based model shows superior performance over not. However, the proposed method is still beaten by CatBoost.

Approach

Overview of the XTab framework

Main Idea

Train a shared backbone (transformer stack) that can handle arbitrary embedding representation of tables.
Learn a new set of embeddings for each table.

Target Task

Main: Classification
Auxiliary: Reconstruction, Contrastive, Supervised (classification)

Input Transformation

categorical: one-hot encoding and embedding lookup table.
continuous: single embedding and scale by value.

Findings

Experiments

SETTING

3 Types of backbones (FT-Transformer, FastFormer, Saint-V)
Datasets from OpenML-AutoML Benchmark
- 52 datasets for pre-training and 52 for fine-tuning/inference.
Baselines:
- Tree-based: Random Forest, XGBoost, LightGBM, CatBoost
- NN: Autogluon
- Transformer: FT-Transformer

Cross-table pre-training shows improved performance compared to single-table training.
- Ablate with same FT-Transformer structure.
light fine-tuning (3 epochs) is actually better than heavy fine-tuning (unlimited epochs until early stopping).
GBDTs are still quite strong. But XTab seems best in the DL group.

Inwon Kang

XTab: Cross-table Pretraining for Tabular Transformers

Tags

Next Article →

Table of contents