AI SECURITY · NETWORK SECURITY ENGINEER · APPLIED AI
Pawan Bishwokarma
|
I write about AI concepts, cybersecurity, network defense, and the projects I build while learning how intelligent systems can be used, secured, and understood.
A college-level breakdown of how DeepSeek-R1 used reinforcement learning and GRPO to incentivize reasoning, why the approach mattered, and how later work on hierarchical reasoning helps explain what may be happening inside RL-trained reasoning models.