♫
HomeBlogProjectsPhotography

Blog

Some random thoughts.

Jun 12, 2026
LLMReinforcement LearningGRPOReasoning

GRPO Algorithm

Group Relative Policy Optimization (GRPO) is an efficient RL method used in DeepSeek-R1 to boost reasoning, using group-normalized rewards without a separate critic model.

© 2026 Li Yunhe. All rights reserved.