Summary

A summary of TableGPT2, a fine-tuned Qwen2 based model for native tabular data processing.

Approach

Semantic Table Encoder

Similar to vision-language models, the authors opt to use a Q-former1 style adapter framework for aligning representation of table columns to the language model.

Semantic Table Encoder

Semantic Table Encoder

A description of how the tabular data is directly encoded into tokens. Embeddings from the encoder (stacks of col and row-wise attention blocks with no positional embeddings) are passed through an adapter that combines them into a single representation of kk embeddings.

Then the table is serialized into a hybrid form of both text and embeddings:

table tab_name, columns=[tab_name.col_name(<col_emb>|dtype|if_primary_key)|[values]]

where <col_emb> is replaced with the kk embeddings from the adapter.

Encoder Pre-Training

The encoder is pre-trained with a contrastive loss on the column embeddings. A table TiT_i is split into two snapshots SiS_i and SiS_i', and the encoder tries to pull together matching columns from SiS_i and SiS'_i, while pushing apart the rest:

L(τ,P)=1PePlogexp(ee+/τ)eP{eexp(ee/τ)\mathcal{L}(\tau, P) = -\frac{1}{|P|} \sum_{\mathbf{e}\in P} \log \cfrac{\exp(\mathbf{e}^\top\mathbf{e}_{+}/\tau)}{\sum_{\mathbf{e'} \in P\{\mathbf{e}} \exp(\mathbf{e}^{\top}\mathbf{e'}/\tau)}

where τ\tau is the temperature and e+\mathbf{e}_{+} is the positive pair (embedding of same column) and PP is the pool of column embeddings from both SiS_i and SiS_i'.

Encoder/Adapter Joint Training

To align the encoder's embeddings to the language model, both the encoder and the adapter are trained jointly on the following tasks:

Synthetically created tasks:

  • column prediction
  • cell prediction

Existing datasets (FetaQA, WikiTableQuestion, ToTTo) are modified to create more tasks:

  • question generation
  • table titling
  • row summarization

Data Collection

The authors collect two types of data for training TableGPT.

  1. Text data for continuous pre-training of the Qwen2 base model.
  2. Tabular data for supervised fine-tuning of the table encoder/adapter and language model.
TypeDescription
DatabaseMulti-table setting, large and numerical
Web PageSimple and have contextual text data
ExcelStructured data, government, finance, etc.
Academic Taskdata used in research, often suitable for TableQA or NL2SQL
Special FormatSpecific formats, like invoice, bill, etc.
Pre-test TaskForcasting, prediction projects etc.

Types of tables gathered

Using this, the authors use larger models (GPT4-o, LLaMA, etc.) to generate a set of queries of broadly two types:

  • Single-turn
  • Multi-turn -- where the user may ask to improve the previous response or ask additional question(s).

Once the queries are generated and vetted by human annotators, the larger models are again used to generate the answers to these queries. The authors employ a strategy they call synthesize and refine to generate higher-quality responses using these larger LLMs.

Agent Framework

Agentic Framework

Agentic Framework

A core pipeline diagram of a agent framework using TableGPT2.

The authors propose an agent framework with TableGTP2. The tablular agent is able to generate code to parse/grab tables to achieve its task.

Benchmarking

Existing Benchmarks

  • Table Understanding
    • TURL: table interpretation (e.g., column type annotation, relation extraction, entity linking) and table augmentation (e.g., row population)
  • Table Question Answering (TableQA)
  • Table Fact Verification
  • Table-to-Text Generation (Table2Text)
  • Natural Language to SQL (NL2SQL)
  • Holistic
    • TableBench for assessing reasoning capabilities over tables.

RealTabBench

The authors argue that most existing benchmarks are too simple and fail to capture a real-life scenario. To remedy this, they propose a new tabular benchmarking for language models with the following assets:

  • 360 real-life tables from business intelligence scenarios.
  • 6,000 queries.

Human reviewers and LLMs are combined to produce the final metric.

Findings

  • TableGPT2 outperforms other models on RealTabBench.
  • But it also retains most of its general-task (non-table related) capabilities.

The authors exclude TableLlama2 because it loses general-task capabilities.

Resources

Footnotes

  1. Li, Junnan, Dongxu Li, Silvio Savarese, and Steven Hoi. "Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models." In International conference on machine learning, pp. 19730-19742. PMLR, 2023.

  2. Zhang, Tianshu, Xiang Yue, Yifei Li, and Huan Sun. "Tablellama: Towards open large generalist models for tables." arXiv preprint arXiv:2311.09206 (2023). Huggingface link