<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Sudarsh</title><description>A quiet research notebook on AI systems, inference optimization, reasoning architectures, and multimodal AI.</description><link>https://sudarsh.dev/</link><item><title>Optimizing a Layer Normalization Kernel with CUDA: a Worklog</title><link>https://sudarsh.dev/blog/cuda-layernorm-worklog/</link><guid isPermaLink="true">https://sudarsh.dev/blog/cuda-layernorm-worklog/</guid><description>An iterative guide to writing and optimizing a CUDA layer normalization kernel — from a naive single-thread implementation to vectorized loads — benchmarked against PyTorch.</description><pubDate>Mon, 17 Feb 2025 00:00:00 GMT</pubDate></item></channel></rss>