Published on

XTab: Cross-table Pretraining for Tabular Transformers

Summary

Use FT-Transformer architecture to learn dataset-specific embeddings and a shared backbone in a federated setting. Pre-training a transformer-based model shows superior performance over not. However, the proposed method is still beaten by CatBoost.

Approach

Framework overview

Overview of the XTab framework

Main Idea

  • Train a shared backbone (transformer stack) that can handle arbitrary embedding representation of tables.
  • Learn a new set of embeddings for each table.

Target Task

  • Main: Classification
  • Auxiliary: Reconstruction, Contrastive, Supervised (classification)

Input Transformation

  • categorical: one-hot encoding and embedding lookup table.
  • continuous: single embedding and scale by value.

Findings

Experiments

SETTING

  • 3 Types of backbones (FT-Transformer, FastFormer, Saint-V)
  • Datasets from OpenML-AutoML Benchmark
    • 52 datasets for pre-training and 52 for fine-tuning/inference.
  • Baselines:
    • Tree-based: Random Forest, XGBoost, LightGBM, CatBoost
    • NN: Autogluon
    • Transformer: FT-Transformer
  • Cross-table pre-training shows improved performance compared to single-table training.
    • Ablate with same FT-Transformer structure.
  • light fine-tuning (3 epochs) is actually better than heavy fine-tuning (unlimited epochs until early stopping).
  • GBDTs are still quite strong. But XTab seems best in the DL group.

Resources