AI
AI News

Next Concept Prediction in Discrete Latent Space Leads to Stronger Language Models

Source:arXiv
Original Author:Yuliang Liu et al.
Next Concept Prediction in Discrete Latent Space Leads to Stronger Language Models

Image generated by Gemini AI

Researchers have introduced Next Concept Prediction (NCP), a novel pretraining method for language models, implemented in their model ConceptLM. NCP predicts discrete concepts across multiple tokens, enhancing the training challenge. ConceptLM, trained from 70M to 1.5B parameters on extensive datasets, shows improved performance on 13 benchmarks compared to traditional methods. Additionally, NCP enhances continual pretraining, indicating its potential for developing more robust language models.

Next Concept Prediction Enhances Language Model Performance

A new generative pretraining paradigm, Next Concept Prediction (NCP), has been introduced to boost the capabilities of language models. The model, named ConceptLM, employs Vector Quantization to create a concept vocabulary and integrates both NCP and Next Token Prediction (NTP) to inform the token generation process. It has been trained from scratch with sizes ranging from 70 million to 1.5 billion parameters, utilizing up to 300 billion data points.

Performance Gains Across Benchmarks

Results from 13 evaluation benchmarks demonstrate that NCP consistently outperforms traditional token-level models. This suggests that a more challenging pretraining task through the prediction of concepts significantly strengthens language model capabilities. Additionally, continual pretraining experiments on an 8-billion-parameter Llama model reveal that NCP can enhance models initially trained using NTP.

Related Topics:

Next Concept PredictionConceptLMVector Quantizationpretraining objectivelanguage models

📰 Original Source: https://arxiv.org/abs/2602.08984v1

All rights and credit belong to the original publisher.

Share this article