- Published on
PTab: Using the Pre-trained Language Model for Modeling Tabular Data
Summary
This paper proposes PTab, a method to fine-tune a pre-trained language model into a tabular classifier. The authors leverage masked language model paradigm to learn representation of tabular data.
Approach

Overview of PTab
1. Modality transformation
- Convert the tabular data into text.
- e.g.
age:18 sex:male job:student ...
- e.g.
2. Masked fine-tuning
- Turn the textualized tabular data into a sequence of tokens.
- Fine-tune the model to predict masked token, like how BERT works.
IMPORTANT
The whole model is being fine-tuned, not just the token embeddings!
3. Task fine-tuning
- Take the best model from masked fine-tunine.
- Use embedding of
[CLS]
as input to a classifier.
Experiments
- 8 binary classification datasets.
Findings
- Outperforms both (non-GBDT) SotA and GBDT.
- Ablating the masked fine-tuning shows that the model indeed picks up on semantically useful features when MFT is applied.