[LLM] Base Model과 Instruct Model, 그리고 Chat Template

📚 Study/Paper Review

[LLM] Base Model과 Instruct Model, 그리고 Chat Template

윰갱 2025. 3. 27. 14:35

# Base Model, Instruct Model

Base Model: 단순히 다음 토큰 예측이라는 목표로 사전 학습만을 거친 모델

Instruct Model: 특정한 목적의 태스크를 수행하도록 별도의 파인튜닝을 거친 모델

아래 사진처럼,

아무것도 붙어있지 않으면 base이고 instruct 모델은 뒤에 Instruct, it, chat 등 뭔가 추가로 붙어 있다.

쉽게 말하면 GPT와 ChatGPT 같은 느낌이다.

ChatGPT도 원래는 InstructGPT라는 모델을 베이스로 하는데,
이 모델이 바로 사용자의 입력에 맞는 적절한 응답을 생성하도록 별도로 학습이 된 모델이다.
물론 본질적으로 다음 토큰을 예측한다는 사실은 변하지 않지만,
사전 학습만 된 Base Model은 말 그대로, 입력이 질문인지 아닌지에는 관심이 없고 그럴 듯한 문장을 이어서 생성하기만 한다. 반면 Instruct Model은 명시적으로 질문에 대한 응답을 생성하도록 추가적으로 훈련이 되었다는 차이가 있다.

# Chat Template

우리가 LLM을 사용하는 목적은 대부분 질문에 대한 대답을 얻거나, 어떤 작업을 수행하는데 있다.

그러면 단순히 Instruct Model을 사용하는 것만으로 해결이 될까?

맞긴 하지만, 이 모델을 사용할 때도 올바른 프롬프트를 넘겨주는 것이 필요하다.

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

device = "cuda" if torch.cuda.is_available() else "cpu"
# base_model = "meta-llama/Llama-3.2-1B"
base_model = "meta-llama/Llama-3.2-1B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(base_model)
model = AutoModelForCausalLM.from_pretrained(base_model).to(device)

raw_input = "What is Large Language Model?"
encoded_input = tokenizer(raw_input, return_tensors="pt").to(device)

outputs = model.generate(**encoded_input, max_new_tokens=20)
print(tokenizer.decode(outputs[0]))

# <|begin_of_text|>What is Large Language Model? (with examples and applications) 
# A large language model is a type of artificial intelligence (AI) model

출력을 살펴보면, 언뜻 질문에 답을 하는 것처럼 보이지만 이상한 부분이 있다.

(with examples and applications)라는 문장을 마음대로 추가하고, 그 다음에 그에 대한 대답을 생성하고 있기 때문이다.

그 이유는, LLM의 동작 원리를 살펴보면 이해할 수 있다.

사용자가 LLM을 넘겨주는 입력은 사실 있는 그대로 들어가지 않는다.

사용자가 입력해준 원본 문장을 Query라고 하면, 이 쿼리 앞뒤에 다양한 요소가 붙어서 LLM으로 입력된다.

시스템 프롬프트
LLM에 입력되는 프롬프트는 다양한 구성된다.
- 시스템 프롬프트
- 지시사항
- 사용자의 입력
{system} 당신은 유용한 AI 어시스턴트입니다. {/system}
{instruction} 질문에는 항상 한국어로 대답하세요. {/instruction}
{user} Large Language Model이 무엇인가요? {/user}
{assistant}
시스템 프롬프트에는 LLM이 어떤 지침을 바탕으로 응답을 생성해야 하는지가 명시되어 있기 때문에, 개떡같이 말해도 찰떡같이 알아듣는 경우가 많다.

우리가 LLM에 입력하는 내용은 {user} {\user} 사이에 들어가는 내용이겠지만,
채팅을 위해 학습된 LLM이 실제로 학습 과정에서 입력 받는 프롬프트의 전문은 위와 같다.
따라서 Inference 단계에서도 위와 같은 형식에 맞춰서 입력을 전달해줘야 한다.

이를 채팅 템플릿 Chat Template이라고 한다.

Meta에서는 버전별로 Llama 모델의 Prompt Template에 대한 가이드를 자세하게 제공한다.

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 23 July 2024

You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

What is the capital of France?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

<|begin_of_text|>: 말 그대로 프롬프트의 시작임을 알리는 용도

<|start_header_id|>, <|end_header_id|>: 특정한 주체의 턴(Turn)임을 알리는 용도 (system, user, assistant)

<|eot_id|>: end of turn의 줄임말로, 특정한 주체의 턴이 끝났음을 알리는 용도

위 템플릿에서는 시스템 프롬프트에 LLM에게 간단한 지시사항을 준 후에 시스템 프롬프트의 턴이 끝났음을 알리고, 이어서 사용자가 프랑스의 수도가 어딘지를 물은 후 질문하는 턴이 끝났음을 명시한다.

이제 이 프롬프트를 입력받은 LLM은 프랑스의 수도가 어딘지를 대답한 후 assistant의 턴을 종료하겠죠.

매번 이런 템플릿에 맞춰 프롬프트를 입력하는건 매우 귀찮은 일이다.

transformers 라이브러리에는 각 모델에 맞게 Chat Template을 자동으로 구성해주는 기능이 있다.

messages = [{"role": "user", "content": "What is Large Language Model?"}]

encoded_prompt = tokenizer.apply_chat_template(messages, 
                                               add_generation_prompt=True,
                                               return_tensors="pt").to(device)
                                               
outputs = model.generate(encoded_prompt, max_new_tokens=50)
print(tokenizer.decode(outputs[0]))

"""
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 01 Jan 2025

<|eot_id|><|start_header_id|>user<|end_header_id|>

What is Large Language Model?<|eot_id|><|start_header_id|>assistant<|end_header_id|>

A Large Language Model (LLM) is a type of artificial intelligence (AI) model that is designed to process and understand human language. It is a type of neural network that uses deep learning techniques to analyze and generate text.

A Large Language Model
"""

max_new_tokens 가 길지 않아서 응답이 잘렸지만,

아무튼 이제는 질문에 뭔가를 덧붙이지 않고 대답만을 생성하고 있음을 확인할 수 있다.

참고로 응답을 마치면 아래와 같이 끝에 <|eot_id|> 가 생성된다.

max_length 를 모두 채우지 않아도, 질문에 대한 답을 충분히 했다고 생각되면 턴을 마친다는 의미의 스페셜 토큰을 마지막으로 출력하고, 생성을 끝내는 것이다.

Instruct 모델을 사용할 때는 반드시 Chat Template에 맞춰서 입력을 넣어줘야 한다.

LLM에는 단순히 사용자가 직접적으로 작성한 쿼리만 입력되는 게 아니다. (시스템 프롬프트 등)

ex.
*단순히 사용자 쿼리만 주는 경우만을 생각하기 쉽지만*
사용자: 파리는 어느 나라의 수도야?

*실제 LLM 입력은 이렇게 더 풍부하다:*
<|system|> 당신은 친절하고 논리적인 AI입니다. <|end|>
<|user|> 파리는 어느 나라의 수도야? <|end|>

'📚 Study > Paper Review' 카테고리의 다른 글

[Paper Review] Vision-Language Models for Vision Tasks: A Survey (0)	2025.05.08
[Paper Review] QLoRA: Efficient Finetuning of Quantized LLMs (0)	2025.04.11
[Paper Review] Compact3D: Smaller and Faster Gaussian Splatting with Vector Quantization (2)	2024.07.14
K-means Clustering에서 centroid와 assignment 업데이트하는 것 중 뭐가 더 오래 걸릴까? (1)	2024.07.12
[Paper Review] Mini-Splatting: Representing Scenes with a Constrained Number of Gaussians (0)	2024.06.04

현재글[LLM] Base Model과 Instruct Model, 그리고 Chat Template

공부한 것을 기록해요 🙌🏻 (+https://velog.io/@dusruddl2/posts)

Today :
Yesterday :

일	월	화	수	목	금	토
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Tech blog