ContentPosts from @purushothamb..
Link
@faun shared a link, 1 month ago

LLM Evaluation: Practical Tips at Booking.com

Booking.com built Judge-LLM, a framework where strong LLMs evaluate other models against a carefully curated golden dataset. Clear metric definitions, rigorous annotation, and iterative prompt engineering make evaluations more scalable and consistent than relying solely on humans. **The takeaway**:..

Link
@faun shared a link, 1 month ago

Guardians of the Agents 

A new static verification framework wants to make runtime safeguards look lazy. It slaps **mathematical safety proofs** onto LLM-generated workflows *before* they run—no more crossing fingers at execution time. The setup decouples **code from data**, then runs checks with tools like **CodeQL** and ..

Link
@faun shared a link, 1 month ago

GitHub Copilot on autopilot as community complaints persist

GitHub's biggest debates right now? Whether to shut down AI-generated "noise" fromCopilot—stuff like auto-written issues and code reviews. No clear answers from GitHub yet. Frustration is piling up. Some devs are ditching the platform altogether, shifting their projects toCodebergor spinning upself-..

GitHub Copilot on autopilot as community complaints persist
Link
@faun shared a link, 1 month ago

Building Agents for Small Language Models: A Deep Dive into Lightweight AI

Agent engineering with **small language models (SLMs)**—anywhere from 270M to 32B parameters—calls for a different playbook. Think tight prompts, offloaded logic, clean I/O, and systems that don’t fall apart when things go sideways. The newer stack—**GGUF** + **llama.cpp**—lets these agents run loc..

Link
@faun shared a link, 1 month ago

Vibe coding has turned senior devs into ‘AI babysitters,’ but they say it’s worth it

Fastly says95% of developersspend extra time fixing AI-written code. Senior engineers take the brunt. That overhead has even spawned a new gig: “vibe code cleanup specialist.” (Yes, seriously.) As teams lean harder on AI tools, reliability and security start to slide—unless someone steps in. The re..

Vibe coding has turned senior devs into ‘AI babysitters,’ but they say it’s worth it
Link
@faun shared a link, 1 month ago

Introducing the MCP Registry

The new **Model Context Protocol (MCP) Registry** just dropped in preview. It’s a public, centralized hub for finding and sharing MCP servers—think phonebook, but for AI context APIs. It handles public and private subregistries, publishes OpenAPI specs so tooling can play nice, and bakes in communit..

Link
@faun shared a link, 1 month ago

Writing an operating system kernel from scratch

A barebonestime-sharing OS kernel, written inZig, running onRISC-V. It leans onOpenSBIfor console I/O and timer interrupts. Threads? Statically allocated, each running inuser mode (U-mode). The kernel stays insupervisor mode (S-mode), where it catchessystem callsandcontext switchesvia timer ticks. ..

Writing an operating system kernel from scratch
Link
@faun shared a link, 1 month ago

Magical systems thinking

AI now writes over **25% of Google’s** and as much as **90% of Anthropic’s** code. That’s not a trend—it’s a regime change. Still, the mess in large public systems reminds us: clever analysis isn’t enough. Complex systems don’t behave; they misbehave. When the machines are churning out code, the ..

Magical systems thinking
Link
@faun shared a link, 1 month ago

PostgreSQL maintenance without superuser

PostgreSQL’s moving in on superusers. As of recent releases—starting way back in v9.6 and maturing through PostgreSQL 18 (coming 2025)—there are now **15+ built-in admin roles**. No need to hand out superuser just to get things done. These roles cover the ops spectrum: monitoring, backups, fil..

PostgreSQL maintenance without superuser
Link
@faun shared a link, 1 month ago

Scaling Prometheus: Managing 80M Metrics Smoothly

Flipkart ditched its creakyStatsD + InfluxDBstack for afederated Prometheussetup—built to handle 80M+ time-series metrics without choking. The move leaned intopull-based collection,PromQL's firepower, andhierarchical federationfor smarter aggregation and long-haul queries. Why it matters:Prometheus..

Scaling Prometheus: Managing 80M Metrics Smoothly