- Published on
Cross-Table Pretraining towards a Universal Function Space for Heterogeneous Tabular Data
Summary
- The authors propose a new architecture called CALinear that enables a meta-function space which can be learned with a combination of multiple basis linear functions.
Approach
- Instead of learning one linear layer after MHSA, learn a set of linear layers and some coefficient generator.
- Claims that this method is more expressive and efficient.
Architecture overview of XTFormer
Vanilla attention:
The authors propose changing to , where:
where is a learnable context vector for feature and is the calibration module.
In the proposed architecture, only and the embedding layer is trained from scratch for a new dataset. is trained over multiple datasets during pre-training but is frozen during fine-tuning.
Training for downstream task
2 stages
- Task Calibration: learn task(dataset)-specific modules from scratch (embedding, output classifier, feature context ).
- Refinement: fine-tune all parameters on the downstream task.
Findings
- XGB and CatBoost outperform most DL baselines.
- But CALinear beats them (pretty much) consistently.
- Using 4 - 6 basis functions significantly outperforms using 1-2 basis functions.
- But 4 and 6 do not show a significant difference.
- Using yields better performance than learning directly.
- More task calibration yields better performance for full-data settings. But for limited-data settings, it actually lowers performance.
- Refinement seems to help in all cases.